Skip to content

fix: [HUDI-8371] Fix Column Stats Record Key using full partition path#18327

Open
linliu-code wants to merge 2 commits intoapache:branch-0.xfrom
linliu-code:pr-18315-1
Open

fix: [HUDI-8371] Fix Column Stats Record Key using full partition path#18327
linliu-code wants to merge 2 commits intoapache:branch-0.xfrom
linliu-code:pr-18315-1

Conversation

@linliu-code
Copy link
Collaborator

Describe the issue this Pull Request addresses

When column stats index is enabled on a table that already has the FILES metadata partition initialized listAllPartitionsFromMDT is used to bootstrap the column stats partition. The method was passing the absolute partition path (e.g., hdfs://host/table/partition1) as the first argument to DirectoryInfo instead of the relative path (e.g., partition1). This caused the column stats index to be keyed on wrong paths, resulting in empty or incorrect column stats lookups during data skipping.

Summary and Changelog

Fix: In HoodieBackedTableMetadataWriter.listAllPartitionsFromMDT, compute the relative partition path using FSUtils.getRelativePartitionPath(basePath, absolutePath) before constructing each DirectoryInfo, instead of passing the absolute map key directly.

Changes:

HoodieBackedTableMetadataWriter.java: Fixed listAllPartitionsFromMDT to use relative partition paths when constructing DirectoryInfo entries.

Impact

No public API or config changes. Users who enable column stats on an existing table (i.e., FILES partition already initialized but column stats was not) will now get a correctly populated column stats index, enabling data skipping to work as expected instead of silently returning no stats.

Risk Level

Low

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@linliu-code linliu-code marked this pull request as ready for review March 16, 2026 12:57
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants