Skip to content

Choosing the Writer Version

Selfeer edited this page Nov 12, 2024 · 1 revision

Writer Version

In parquet-java, the writerVersion specifies the version of the Parquet format used when writing data to Parquet files. This setting determines the encoding, compression algorithms, and metadata structure that will be applied during the write process.

Writer Version 1.0

  • Ensures compatibility with older readers that only support the original Parquet format.
  • Supports basic encoding methods like Plain and Dictionary encoding.
  • Uses the original data page format without additional metadata.
  • Limited to standard compression methods like Snappy and Gzip.

Full example here

  "options": {
    "writerVersion": "1.0",
}

Writer Version 2.0

  • May not be compatible with older readers but introduces enhancements for newer systems.
  • Introduces advanced encodings such as Delta encoding (DELTA_BINARY_PACKED, DELTA_LENGTH_BYTE_ARRAY), which improve compression efficiency for certain data types.
  • Utilizes Data Page V2 format, which includes checksums and more detailed metadata for better data integrity and performance.
  • Supports additional compression codecs (Zstandard (ZSTD), Brotli, LZ4), potentially offering better compression ratios.

Full example here

  "options": {
    "writerVersion": "2.0",
}
Clone this wiki locally