-
Notifications
You must be signed in to change notification settings - Fork 0
Choosing the Writer Version
Selfeer edited this page Nov 12, 2024
·
1 revision
In parquet-java, the writerVersion specifies the version of the Parquet format used when writing data to Parquet files. This setting determines the encoding, compression algorithms, and metadata structure that will be applied during the write process.
- Ensures compatibility with older readers that only support the original Parquet format.
- Supports basic encoding methods like Plain and Dictionary encoding.
- Uses the original data page format without additional metadata.
- Limited to standard compression methods like Snappy and Gzip.
"options": {
"writerVersion": "1.0",
}
- May not be compatible with older readers but introduces enhancements for newer systems.
- Introduces advanced encodings such as Delta encoding (
DELTA_BINARY_PACKED
,DELTA_LENGTH_BYTE_ARRAY
), which improve compression efficiency for certain data types. - Utilizes Data Page V2 format, which includes checksums and more detailed metadata for better data integrity and performance.
- Supports additional compression codecs (
Zstandard (ZSTD)
,Brotli
,LZ4
), potentially offering better compression ratios.
"options": {
"writerVersion": "2.0",
}
Developed and maintained by the Altinity team.
- Home
- Specifying the Parquet File Name
- Specifying Options of the File
- Choosing File Compression
- Choosing the Writer Version
- Specifying Row and Page Size
- Enabling the Bloom Filter
- Configure with Hadoop
- Integer Columns
- Unsigned Integer Columns
- UTF8 Columns
- Decimal Columns
- Date Columns
- Time and Timestamp Columns
- JSON and BSON Columns
- String Columns
- Enum Columns
- UUID Columns
- Array Columns
- Nested Array Columns
- Tuple Columns
- Nested Tuple Columns
- Schema Types