You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Arrow recently introduced FixedShapeTensor and VariableShapeTensor canonical extension types that use FixedSizeList and StructArray(List, FixedSizeList) as storage respectfully. These are targeted at machine learning and scientific applications that deal with large datasets and would benefit from using Parquet as on disk storage.
If Arrow's List was stored as BYTE_ARRAY we would likely see reduced overhead due to reading and writing definition and repetition levels. See discussion here. It would therefore be beneficial to introduce a VARIABLE_SIZE_LIST logical type to Parquet.
The text was updated successfully, but these errors were encountered:
I found it a bit hard for Parquet to optimize tensor, maybe the problem is that rep-def levels for tensor / fixed length byte-array. Maybe I could try to fast check the rep/def-levels in this type
Arrow recently introduced FixedShapeTensor and VariableShapeTensor canonical extension types that use FixedSizeList and StructArray(List, FixedSizeList) as storage respectfully. These are targeted at machine learning and scientific applications that deal with large datasets and would benefit from using Parquet as on disk storage.
If Arrow's List was stored as BYTE_ARRAY we would likely see reduced overhead due to reading and writing definition and repetition levels. See discussion here. It would therefore be beneficial to introduce a VARIABLE_SIZE_LIST logical type to Parquet.
The text was updated successfully, but these errors were encountered: