Skip to content

Conversation

@raulcd
Copy link
Member

@raulcd raulcd commented Nov 21, 2025

Rationale for this change

The Parquet C++ implementation contains information about some metadata that is not exposed to PyArrow. This PR is to expose those.

What changes are included in this PR?

Expose present parquet::ColumnChunkMetaData:

  • has_index_page
  • index_page_offset
  • bloom_filter_offset
  • bloom_filter_length

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, new API methods on pyarrow.parquet ColumnChunkMetadata.

…a about index_page and bloom_filter to Python
@github-actions
Copy link

⚠️ GitHub issue #48199 has been automatically assigned in GitHub to PR creator.

assert metadata.row_group(0).column(0).has_dictionary_page is True
assert metadata.row_group(0).column(0).dictionary_page_offset == 4
assert metadata.row_group(0).column(0).has_index_page is True
assert metadata.row_group(0).column(0).index_page_offset == 0
Copy link
Member Author

@raulcd raulcd Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels weird that has_index_page is True and index_page_offset is 0 but this is a C++ issue (or a file issue) we are just acting as a pass-through here.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Nov 21, 2025
@raulcd
Copy link
Member Author

raulcd commented Nov 21, 2025

@github-actions crossbow submit -g python

@github-actions
Copy link

Revision: 26b9b63

Submitted crossbow builds: ursacomputing/crossbow @ actions-e4a9ff08ad

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-1.3.4-numpy-1.21.2 GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.12-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.12-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.13 GitHub Actions
test-conda-python-3.13-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.13-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.14 GitHub Actions
test-conda-python-emscripten GitHub Actions
test-cuda-python-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-fedora-42-python-3 GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-22.04-python-313-freethreading GitHub Actions
test-ubuntu-24.04-python-3 GitHub Actions

@raulcd raulcd marked this pull request as ready for review November 21, 2025 13:28
@raulcd raulcd requested review from AlenkaF and rok as code owners November 21, 2025 13:28
Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants