Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any actual conversion implementation for arrow and parquet? #2928

Open
chenyuanxing opened this issue Jun 24, 2024 · 5 comments
Open

Comments

@chenyuanxing
Copy link

We found that there are only schema conversions under the parqeut-arrow, so we wanted to ask if there is any code that includes the actual data conversions between parquet and arrow.

@chenyuanxing chenyuanxing changed the title Is there an actual conversion implementation for arrow and parquet? Is there any actual conversion implementation for arrow and parquet? Jun 24, 2024
@wgtmac
Copy link
Member

wgtmac commented Jun 24, 2024

If possible to use C++, I think parquet-cpp in the Apache Arrow is the best solution to your case: https://arrow.apache.org/docs/cpp/parquet.html

@chenyuanxing
Copy link
Author

Yes, We know there is a c++ implementation here, but I was wondering if there is a corresponding implementation for java, since all our code is java .

@chenyuanxing
Copy link
Author

parquet-arrow The library looks like it's meant to do this, But I don't know why it's always just the schema part.

@wgtmac
Copy link
Member

wgtmac commented Jun 25, 2024

I think conversion between parquet and arrow is a valid use case. The parquet-java provides built-in row-level interfaces like avro/thrift/protobuf. Other parquet (Java) implementations (Presto/Trino/Spark) simply leverage the page & metadata reader/writer from this library to build extensions. Extending native arrow support would be a welcome extension to this library, IMO.

@chenyuanxing
Copy link
Author

So, the library parquet-arrow hasn't been used yet? because it only has schema mappings.

And We've looked at transformations in Spark, which are missing some types due to limitations in Spark, such as uint.So it's not really a universal conversion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants