-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-48199: [Python][Parquet] Expose existing Parquet C++ metadata about index_page and bloom_filter to Python #48201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…a about index_page and bloom_filter to Python
|
|
| assert metadata.row_group(0).column(0).has_dictionary_page is True | ||
| assert metadata.row_group(0).column(0).dictionary_page_offset == 4 | ||
| assert metadata.row_group(0).column(0).has_index_page is True | ||
| assert metadata.row_group(0).column(0).index_page_offset == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels weird that has_index_page is True and index_page_offset is 0 but this is a C++ issue (or a file issue) we are just acting as a pass-through here.
|
@github-actions crossbow submit -g python |
|
Revision: 26b9b63 Submitted crossbow builds: ursacomputing/crossbow @ actions-e4a9ff08ad |
rok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Rationale for this change
The Parquet C++ implementation contains information about some metadata that is not exposed to PyArrow. This PR is to expose those.
What changes are included in this PR?
Expose present
parquet::ColumnChunkMetaData:Are these changes tested?
Yes
Are there any user-facing changes?
Yes, new API methods on pyarrow.parquet ColumnChunkMetadata.