-
Notifications
You must be signed in to change notification settings - Fork 0
Configure With Hadoop
Selfeer edited this page Nov 12, 2024
·
1 revision
An alternative way to specify configurations is to use the hadoop library and specify the configurations from there.
"hadoopConfigs": {
"parquet.compression": "UNCOMPRESSED",
"parquet.enable.dictionary": "true",
"parquet.page.size": "1048576"
}
Full description for each parameter can be found here
Property | Values | Default |
---|---|---|
parquet.summary.metadata.level |
all , common_only , none
|
all |
parquet.enable.summary-metadata |
true , false , NONE , all
|
true |
parquet.block.size |
any integer value | 134217728 |
parquet.page.size |
any integer value | 1048576 |
parquet.compression |
uncompressed , snappy , gzip , lzo , brotli , lz4 , zstd and lz4_raw
|
uncompressed |
parquet.enable.dictionary |
true , false
|
true |
parquet.dictionary.page.size |
any integer value | 1048576 |
parquet.writer.version |
PARQUET_1_0 , PARQUET_2_0
|
PARQUET_1_0 |
parquet.validation |
true , false
|
false |
parquet.columnindex.truncate.length |
any integer value | 64 |
parquet.statistics.truncate.length |
any integer value | 2147483647 |
parquet.bloom.filter.enabled |
true , false
|
false |
parquet.bloom.filter.enabled#column.path |
true , false
|
false |
parquet.bloom.filter.adaptive.enabled |
true , false
|
false |
parquet.bloom.filter.candidates.number |
any integer value | 5 |
parquet.bloom.filter.expected.ndv#column.path |
any integer value | 200 |
parquet.bloom.filter.fpp#column.path |
any integer value | 0.01 |
parquet.bloom.filter.max.bytes |
any integer value | 1048576 |
parquet.page.row.count.limit |
any integer value | 20000 |
parquet.page.write-checksum.enabled |
true , false
|
true |
Developed and maintained by the Altinity team.
- Home
- Specifying the Parquet File Name
- Specifying Options of the File
- Choosing File Compression
- Choosing the Writer Version
- Specifying Row and Page Size
- Enabling the Bloom Filter
- Configure with Hadoop
- Integer Columns
- Unsigned Integer Columns
- UTF8 Columns
- Decimal Columns
- Date Columns
- Time and Timestamp Columns
- JSON and BSON Columns
- String Columns
- Enum Columns
- UUID Columns
- Array Columns
- Nested Array Columns
- Tuple Columns
- Nested Tuple Columns
- Schema Types