Skip to content

Configure With Hadoop

Selfeer edited this page Nov 12, 2024 · 1 revision

Hadoop Configurations

An alternative way to specify configurations is to use the hadoop library and specify the configurations from there.

Full example here

  "hadoopConfigs": {
    "parquet.compression": "UNCOMPRESSED",
    "parquet.enable.dictionary": "true",
    "parquet.page.size": "1048576"
}

Possible Configurations To Set With Hadoop

Full description for each parameter can be found here

Property Values Default
parquet.summary.metadata.level all, common_only, none all
parquet.enable.summary-metadata true, false, NONE, all true
parquet.block.size any integer value 134217728
parquet.page.size any integer value 1048576
parquet.compression uncompressed, snappy, gzip, lzo, brotli, lz4, zstd and lz4_raw uncompressed
parquet.enable.dictionary true, false true
parquet.dictionary.page.size any integer value 1048576
parquet.writer.version PARQUET_1_0, PARQUET_2_0 PARQUET_1_0
parquet.validation true, false false
parquet.columnindex.truncate.length any integer value 64
parquet.statistics.truncate.length any integer value 2147483647
parquet.bloom.filter.enabled true, false false
parquet.bloom.filter.enabled#column.path true, false false
parquet.bloom.filter.adaptive.enabled true, false false
parquet.bloom.filter.candidates.number any integer value 5
parquet.bloom.filter.expected.ndv#column.path any integer value 200
parquet.bloom.filter.fpp#column.path any integer value 0.01
parquet.bloom.filter.max.bytes any integer value 1048576
parquet.page.row.count.limit any integer value 20000
parquet.page.write-checksum.enabled true, false true
Clone this wiki locally