Choosing the Writer Version

Writer Version

In parquet-java, the writerVersion specifies the version of the Parquet format used when writing data to Parquet files. This setting determines the encoding, compression algorithms, and metadata structure that will be applied during the write process.

Writer Version 1.0

Ensures compatibility with older readers that only support the original Parquet format.
Supports basic encoding methods like Plain and Dictionary encoding.
Uses the original data page format without additional metadata.
Limited to standard compression methods like Snappy and Gzip.

Full example here

  "options": {
    "writerVersion": "1.0",
}

Writer Version 2.0

May not be compatible with older readers but introduces enhancements for newer systems.
Introduces advanced encodings such as Delta encoding (DELTA_BINARY_PACKED, DELTA_LENGTH_BYTE_ARRAY), which improve compression efficiency for certain data types.
Utilizes Data Page V2 format, which includes checksums and more detailed metadata for better data integrity and performance.
Supports additional compression codecs (Zstandard (ZSTD), Brotli, LZ4), potentially offering better compression ratios.

Full example here

  "options": {
    "writerVersion": "2.0",
}

Altinity logo Developed and maintained by the Altinity team.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choosing the Writer Version

Writer Version

Writer Version 1.0

Writer Version 2.0

Clone this wiki locally