Skip to content

Latest commit

 

History

History
62 lines (55 loc) · 14.9 KB

Configuration.md

File metadata and controls

62 lines (55 loc) · 14.9 KB
layout title nav_order
page
Configuration
4

Spark Configurations for Gluten Plugin

There are many configuration could impact the Gluten Plugin performance and can be fine tuned in Spark. You can add these configuration into spark-defaults.conf to enable or disable the setting.

Parameters Description Recommend Setting
spark.driver.extraClassPath To add Gluten Plugin jar file in Spark Driver /path/to/jar_file
spark.executor.extraClassPath To add Gluten Plugin jar file in Spark Executor /path/to/jar_file
spark.executor.memory To set up how much memory to be used for Spark Executor.
spark.memory.offHeap.size To set up how much memory to be used for Java OffHeap.
Please notice Gluten Plugin will leverage this setting to allocate memory space for native usage even offHeap is disabled.
The value is based on your system and it is recommended to set it larger if you are facing Out of Memory issue in Gluten Plugin
30G
spark.sql.sources.useV1SourceList Choose to use V1 source avro
spark.sql.join.preferSortMergeJoin To turn off preferSortMergeJoin in Spark false
spark.plugins To load Gluten's components by Spark's plug-in loader com.intel.oap.GlutenPlugin
spark.shuffle.manager To turn on Gluten Columnar Shuffle Plugin org.apache.spark.shuffle.sort.ColumnarShuffleManager
spark.gluten.enabled Enable Gluten, default is true true
spark.gluten.sql.columnar.scanOnly When enabled, this config will overwrite all other operators' enabling, and only Scan and Filter pushdown will be offloaded to native. false
spark.gluten.sql.columnar.batchscan Enable or Disable Columnar Batchscan, default is true true
spark.gluten.sql.columnar.hashagg Enable or Disable Columnar Hash Aggregate, default is true true
spark.gluten.sql.columnar.project Enable or Disable Columnar Project, default is true true
spark.gluten.sql.columnar.filter Enable or Disable Columnar Filter, default is true true
spark.gluten.sql.columnar.codegen.sort Enable or Disable Columnar Sort, default is true true
spark.gluten.sql.columnar.window Enable or Disable Columnar Window, default is true true
spark.gluten.sql.columnar.shuffledHashJoin Enable or Disable ShffuledHashJoin, default is true true
spark.gluten.sql.columnar.forceShuffledHashJoin Force to use ShffuledHashJoin over SortMergeJoin, default is true true
spark.gluten.sql.columnar.sort Enable or Disable Columnar Sort, default is true true
spark.gluten.sql.columnar.sortMergeJoin Enable or Disable Columnar Sort Merge Join, default is true true
spark.gluten.sql.columnar.union Enable or Disable Columnar Union, default is true true
spark.gluten.sql.columnar.expand Enable or Disable Columnar Expand, default is true true
spark.gluten.sql.columnar.broadcastExchange Enable or Disable Columnar Broadcast Exchange, default is true true
spark.gluten.sql.columnar.broadcastJoin Enable or Disable Columnar BradcastHashJoin, default is true true
spark.gluten.sql.columnar.shuffle.codec Set up the codec to be used for Columnar Shuffle, default is lz4 lz4
spark.gluten.sql.columnar.shuffle.codecBackend Enable using hardware accelerators for shuffle de/compression. Valid options are QAT and IAA.
spark.gluten.sql.columnar.numaBinding Set up NUMABinding, default is false true
spark.gluten.sql.columnar.coreRange Set up the core range for NUMABinding, only works when numaBinding set to true.
The setting is based on the number of cores in your system. Use 72 cores as an example.
0-17,36-53 |18-35,54-71
spark.gluten.sql.columnar.qat Enable using QAT for shuffle compression. false
spark.gluten.sql.native.bloomFilter Enable of Disable native runtime bloomfilter true
spark.gluten.sql.columnar.wholeStage.fallback.threshold Configure the threshold for whether whole stage will fall back in AQE supported case by counting the number of ColumnarToRow & vanilla leaf node >= 3
spark.gluten.sql.columnar.maxBatchSize Set the number of rows for the output batch 4096
spark.gluten.loadLibFromJar Gluten will load dynamic link library from jars for gluten/cpp. false
spark.gluten.sql.columnar.force.hashagg Force to use hash agg to replace sort agg. true

Below is an example for spark-default.conf, if you are using conda to install OAP project.

##### Columnar Process Configuration

spark.sql.sources.useV1SourceList avro
spark.plugins io.glutenproject.GlutenPlugin
spark.shuffle.manager org.apache.spark.shuffle.sort.ColumnarShuffleManager
spark.gluten.sql.columnar.backend.lib=velox # Valid options: velox, clickhouse
spark.driver.extraClassPath ${GLUTEN_HOME}/package/target/gluten-<>-jar-with-dependencies.jar
spark.executor.extraClassPath ${GLUTEN_HOME}/package/target/gluten-<>-jar-with-dependencies.jar
######