fix: [HUDI-8371] Fix Column Stats Record Key using full partition path#18327
Open
linliu-code wants to merge 2 commits intoapache:branch-0.xfrom
Open
fix: [HUDI-8371] Fix Column Stats Record Key using full partition path#18327linliu-code wants to merge 2 commits intoapache:branch-0.xfrom
linliu-code wants to merge 2 commits intoapache:branch-0.xfrom
Conversation
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe the issue this Pull Request addresses
When column stats index is enabled on a table that already has the FILES metadata partition initialized listAllPartitionsFromMDT is used to bootstrap the column stats partition. The method was passing the absolute partition path (e.g., hdfs://host/table/partition1) as the first argument to DirectoryInfo instead of the relative path (e.g., partition1). This caused the column stats index to be keyed on wrong paths, resulting in empty or incorrect column stats lookups during data skipping.
Summary and Changelog
Fix: In HoodieBackedTableMetadataWriter.listAllPartitionsFromMDT, compute the relative partition path using FSUtils.getRelativePartitionPath(basePath, absolutePath) before constructing each DirectoryInfo, instead of passing the absolute map key directly.
Changes:
HoodieBackedTableMetadataWriter.java: Fixed listAllPartitionsFromMDT to use relative partition paths when constructing DirectoryInfo entries.
Impact
No public API or config changes. Users who enable column stats on an existing table (i.e., FILES partition already initialized but column stats was not) will now get a correctly populated column stats index, enabling data skipping to work as expected instead of silently returning no stats.
Risk Level
Low
Documentation Update
Contributor's checklist