Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old Parquet files with wrong Compressed Size not Readable #2926

Open
pyckle opened this issue Jun 23, 2024 · 0 comments
Open

Old Parquet files with wrong Compressed Size not Readable #2926

pyckle opened this issue Jun 23, 2024 · 0 comments

Comments

@pyckle
Copy link

pyckle commented Jun 23, 2024

In certain circumstances, the CLI will fail to read old (perhaps ancient) parquet files that have an incorrect compressed_size field set in the column metadata that does not include the dictionary page (at least according to the comment in the code). The code that is supposed to handle this does not flip the byte buffer it reads the extra bytes into. It appears to have been broken for a few years now.

I have written a PR that includes a defective parquet file with this issue, wrote a unit test that fails without the additional flip, and validated that the code works afterwards.

This is a trivial minor issue that was from learning the code rather than actually addressing a production issue, so there's no urgency.

pyckle added a commit to pyckle/parquet-java that referenced this issue Jun 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant