Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid encoding: PLAIN_DICTIONARY #108

Closed
ekschro opened this issue Jun 30, 2020 · 8 comments
Closed

invalid encoding: PLAIN_DICTIONARY #108

ekschro opened this issue Jun 30, 2020 · 8 comments

Comments

@ekschro
Copy link

ekschro commented Jun 30, 2020

The Issue

When trying to read a test parquet file fetched from an s3 bucket, I get an invalid encoding: PLAIN_DICTIONARY error. This is after getting an invalid parquet version error multiple times due to corrupt files. So, I would think this is a sign that the file is being recognized as a parquet file and just not being read correctly. Is there anything I am not doing correctly?

The Code

(async () => {
  try {
    let reader = await parquet.ParquetReader.openFile('./fetched3.parquet');

    let cursor = reader.getCursor();

    let record = null;
    while (record = await cursor.next()) {
      console.log(record);
    }
  }
  catch(err) {
    console.error(err)
  }
})();
@ekschro
Copy link
Author

ekschro commented Jul 1, 2020

I just realized that PLAIN and PLAIN_DICTIONARY are two different forms of encoding.

Are there any plans to support PLAIN_DICTIONARY encoding in the future?

@zeitiger
Copy link

I would be interested in this too

@ekschro
Copy link
Author

ekschro commented Oct 18, 2021

Hey @zeitiger - Did you ever find a work around for this?

@mattfysh
Copy link

I'm also getting this error, trying to read a parquet file created by AWS Wrangler (aka AWS SDK Pandas), no solution yet

@hackermondev
Copy link

any updates on this

@valdo404
Copy link

For your information the lib does not support RLE_DICTIONARY as well. The workaround was to reencode the file to PLAIN

@valdo404
Copy link

Also it does not support float8 data types

@ekschro
Copy link
Author

ekschro commented Sep 26, 2024

Hello everyone - I hope your last 4 years have been pleasant!

I find myself once again doing some work with Parquet files in Node.

The following package supports PLAIN_DICTIONARY 🎉

https://github.com/hyparam/hyparquet

Closing this issue!

@ekschro ekschro closed this as completed Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants