Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Few TPC-H queries perform worse after increasing off-heap memory #8571

Open
VaibhavFRI opened this issue Jan 20, 2025 · 0 comments
Open
Labels
bug Something isn't working triage

Comments

@VaibhavFRI
Copy link

Backend

VL (Velox)

Bug description

Hi team,

I am experimenting with off-heap memory to understand its effect on the TPC-H Benchmark for Gluten + Velox backend. However, I noticed that increasing off-heap memory does not consistently speed up all 22 TPC-H queries. The speedup is not uniform, and some queries, such as Q18, Q19, Q20, Q11, and Q3, either perform worse or show no noticeable improvement.

Can you help me understand the effect of off-heap memory on query performance and how I might improve performance for these specific queries?

Experiments Conducted:
I tested the following off-heap and on-heap memory combinations:

Off-heap memory: 6GB, Executor memory: 30GB, Executors: 4x4
Off-heap memory: 12GB, Executor memory: 30GB, Executors: 4x4
Off-heap memory: 20GB, Executor memory: 30GB, Executors: 4x4
Test Environment:
Instance: ARM-based AWS instance (m7g.4xlarge)
VCPUs: 16
Memory: 64GB
Spark Version: 3.5.2
Data Size: Scale Factor SF=100
Any insights or recommendations on optimizing these queries with off-heap memory would be greatly appreciated.

Image

Spark version

Spark-3.5.x

Spark configurations

cat tpch_parquet.scala | ${SPARK_HOME}/bin/spark-shell
--master spark://172.32.5.244:7077 --deploy-mode client
--conf spark.plugins=org.apache.gluten.GlutenPlugin
--conf spark.driver.extraClassPath=${GLUTEN_JAR}
--conf spark.executor.extraClassPath=${GLUTEN_JAR}
--conf spark.memory.offHeap.enabled=true
--conf spark.memory.offHeap.size=
--conf spark.gluten.sql.columnar.forceShuffledHashJoin=true
--conf spark.driver.memory=4G
--conf spark.executor.instances=4
--conf spark.executor.memory=7500m
--conf spark.executor.cores=4
--conf spark.sql.shuffle.partitions=32
--conf spark.executor.memoryOverhead=2g
--conf spark.driver.maxResultSize=2g
--conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
--conf spark.driver.extraJavaOptions="--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED"
--conf spark.executor.extraJavaOptions="--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED" \

System information

Gluten Version: 1.3.0-SNAPSHOT
Commit: 4dfdfd7
CMake Version: 3.28.3
System: Linux-6.8.0-1021-aws
Arch: aarch64
CPU Name:
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.10/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

@VaibhavFRI VaibhavFRI added bug Something isn't working triage labels Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

1 participant