Skip to content

Extract shared ParquetReadPlan for leaf column resolution#20913

Open
friendlymatthew wants to merge 2 commits intoapache:mainfrom
pydantic:pr-20854-followup
Open

Extract shared ParquetReadPlan for leaf column resolution#20913
friendlymatthew wants to merge 2 commits intoapache:mainfrom
pydantic:pr-20854-followup

Conversation

@friendlymatthew
Copy link
Contributor

Note: please review from the second commit. This PR is currently based off of #20854

Rationale for this change

This PR extracts a reusable build_parquet_read_plan function and ParquetReadPlan struct from the row filter's FilterCandidateBuilder::build

Today, struct-aware leaf projection only exists in the row filter path. The opener still uses ProjectionMask::roots which reads all leaves of a struct even when only one field is needed. By pulling this logic into a shared abstraction, the opener can reduce unnecessary I/O for struct-heavy schemas

@github-actions github-actions bot added the datasource Changes to the datasource crate label Mar 12, 2026
Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it the idea is that this can be re-used for the projection side of things because it was written in a reusable manner, even thought at the moment it is only used for the row filter?

I generally like this change :)

|| self.is_nested_type_supported(return_type)
{
// try to resolve all field name arguments to strinrg literals
// if any argument is not a string literal, we can not determine the exact
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what circumstances would get_field accept non-string literal arguments? Would this be a column e.g. get_field(struct_col, fields_col)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants