You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am experimenting with off-heap memory to understand its effect on the TPC-H Benchmark for Gluten + Velox backend. However, I noticed that increasing off-heap memory does not consistently speed up all 22 TPC-H queries. The speedup is not uniform, and some queries, such as Q18, Q19, Q20, Q11, and Q3, either perform worse or show no noticeable improvement.
Can you help me understand the effect of off-heap memory on query performance and how I might improve performance for these specific queries?
Experiments Conducted:
I tested the following off-heap and on-heap memory combinations:
Off-heap memory: 6GB, Executor memory: 30GB, Executors: 4x4
Off-heap memory: 12GB, Executor memory: 30GB, Executors: 4x4
Off-heap memory: 20GB, Executor memory: 30GB, Executors: 4x4
Test Environment:
Instance: ARM-based AWS instance (m7g.4xlarge)
VCPUs: 16
Memory: 64GB
Spark Version: 3.5.2
Data Size: Scale Factor SF=100
Any insights or recommendations on optimizing these queries with off-heap memory would be greatly appreciated.
Backend
VL (Velox)
Bug description
Hi team,
I am experimenting with off-heap memory to understand its effect on the TPC-H Benchmark for Gluten + Velox backend. However, I noticed that increasing off-heap memory does not consistently speed up all 22 TPC-H queries. The speedup is not uniform, and some queries, such as Q18, Q19, Q20, Q11, and Q3, either perform worse or show no noticeable improvement.
Can you help me understand the effect of off-heap memory on query performance and how I might improve performance for these specific queries?
Experiments Conducted:
I tested the following off-heap and on-heap memory combinations:
Off-heap memory: 6GB, Executor memory: 30GB, Executors: 4x4
Off-heap memory: 12GB, Executor memory: 30GB, Executors: 4x4
Off-heap memory: 20GB, Executor memory: 30GB, Executors: 4x4
Test Environment:
Instance: ARM-based AWS instance (m7g.4xlarge)
VCPUs: 16
Memory: 64GB
Spark Version: 3.5.2
Data Size: Scale Factor SF=100
Any insights or recommendations on optimizing these queries with off-heap memory would be greatly appreciated.
Spark version
Spark-3.5.x
Spark configurations
cat tpch_parquet.scala | ${SPARK_HOME}/bin/spark-shell
--master spark://172.32.5.244:7077 --deploy-mode client
--conf spark.plugins=org.apache.gluten.GlutenPlugin
--conf spark.driver.extraClassPath=${GLUTEN_JAR}
--conf spark.executor.extraClassPath=${GLUTEN_JAR}
--conf spark.memory.offHeap.enabled=true
--conf spark.memory.offHeap.size=
--conf spark.gluten.sql.columnar.forceShuffledHashJoin=true
--conf spark.driver.memory=4G
--conf spark.executor.instances=4
--conf spark.executor.memory=7500m
--conf spark.executor.cores=4
--conf spark.sql.shuffle.partitions=32
--conf spark.executor.memoryOverhead=2g
--conf spark.driver.maxResultSize=2g
--conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
--conf spark.driver.extraJavaOptions="--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED"
--conf spark.executor.extraJavaOptions="--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED" \
System information
Gluten Version: 1.3.0-SNAPSHOT
Commit: 4dfdfd7
CMake Version: 3.28.3
System: Linux-6.8.0-1021-aws
Arch: aarch64
CPU Name:
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.10/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt
Relevant logs
The text was updated successfully, but these errors were encountered: