Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Format] Specify FIXED_SIZE_LIST Logical type #430

Open
asfimport opened this issue May 15, 2024 · 0 comments
Open

[Format] Specify FIXED_SIZE_LIST Logical type #430

asfimport opened this issue May 15, 2024 · 0 comments

Comments

@asfimport
Copy link
Collaborator

Replicated from mailing list

Arrow recently introduced FixedShapeTensor and VariableShapeTensor canonical extension types that use FixedSizeList and StructArray(List, FixedSizeList) as storage respectfully. These are targeted at machine learning and scientific applications that deal with large datasets and would benefit from using Parquet as on disk storage.

However currently FixedSizeList is stored as List in Parquet which adds significant conversion overhead when reading and writing as discussed here. It would therefore be beneficial to introduce a FIXED_SIZE_LIST logical type to Parquet.

Reporter: Rok Mihevc

PRs and other links:

Note: This issue was originally created as PARQUET-2474. Please see the migration documentation for further details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant