You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
toParquetMetadata method converts org.apache.parquet.hadoop.metadata.ParquetMetadata to org.apache.parquet.format.FileMetaData but this does not set the dictionary page offset bit in FileMetaData.
When a FileMetaData object is serialized while writing to the footer and then deserialized, the dictionary offset is lost as the dictionary page offset bit was never set.
PARQUET-1850 tried to fix this but it did only a partial fix.
It sets setDictionary_page_offset only if getEncodingStats are present
if (columnMetaData.getEncodingStats() != null
&& columnMetaData.getEncodingStats().hasDictionaryPages())
{ metaData.setDictionary_page_offset(columnMetaData.getDictionaryPageOffset()); }
However, it should setDictionary_page_offset even when getEncodingStats are not present but encodings are present.
It should use the implementation in ColumnChunkMetatdata below:
toParquetMetadata method converts org.apache.parquet.hadoop.metadata.ParquetMetadata to org.apache.parquet.format.FileMetaData but this does not set the dictionary page offset bit in FileMetaData.
When a FileMetaData object is serialized while writing to the footer and then deserialized, the dictionary offset is lost as the dictionary page offset bit was never set.
PARQUET-1850 tried to fix this but it did only a partial fix.
It sets setDictionary_page_offset only if getEncodingStats are present
However, it should setDictionary_page_offset even when getEncodingStats are not present but encodings are present.
It should use the implementation in ColumnChunkMetatdata below:
So new change in ParquetMetadataCOnvertor should be like:
Reporter: Abhishek Dixit
PRs and other links:
Note: This issue was originally created as PARQUET-2464. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: