Generating speedup factors for Dataproc GKE L4 GPU instances #617

parthosa · 2023-10-13T23:35:54Z

Contributes to #253. This PR adds support for Dataproc GKE with L4 instances in core tools.

Changes:

Operator scores are generated using the validate_qualification_estimates.py script.

Metrics

Actual GPU Speed Up : 3.74
Estimated Speed Up. : 3.09
Error : -21.04%

Spark Config

GPU:

spark.kubernetes.container.image=knobby/spark-rapids:latest 
spark.executor.resource.gpu.vendor=nvidia.com 
spark.executor.extraClassPath=/opt/spark/jars/rapids-4-spark_2.12-23.10.0-20230828.131939-24-cuda11.jar 
spark.driver.extraClassPath=/opt/spark/jars/rapids-4-spark_2.12-23.10.0-20230828.131939-24-cuda11.jar 
spark.executor.resource.gpu.amount=1 
spark.executor.resource.gpu.discoveryScript=/opt/spark/getGpusResources.sh 
spark.plugins=com.nvidia.spark.SQLPlugin 
spark.dynamicAllocation.enabled=false 
spark.task.resource.gpu.amount=0.0625 
spark.driver.maxResultSize=2GB 
spark.driver.memory=50G 
spark.executor.cores=16 
spark.executor.instances=8 
spark.rapids.sql.batchSizeBytes=1GB 
spark.locality.wait=0 
spark.rapids.sql.concurrentGpuTasks=2 
spark.executor.memory=16G 
spark.sql.files.maxPartitionBytes=2gb 
spark.rapids.memory.host.spillStorageSize=32G 
spark.rapids.memory.pinnedPool.size=8g 
spark.executor.memoryOverhead=16G 
spark.scheduler.minRegisteredResourcesRatio=1.0 
spark.rapids.shuffle.multiThreaded.writer.threads=16 
spark.rapids.shuffle.multiThreaded.reader.threads=16 
spark.rapids.shuffle.mode=MULTITHREADED

Workload Information:
- Controller: 1 X n1-standard-16
- Driver: 1 X n1-standard-16
- Executors: 8 X n1-standard-32, (with 8 X L4 24GB)

CPU:

spark.kubernetes.container.image=knobby/spark-rapids:latest
spark.dynamicAllocation.enabled=false
spark.driver.maxResultSize=2GB
spark.driver.memory=50G
spark.executor.cores=16
spark.executor.instances=8
spark.locality.wait=0
spark.executor.memory=16G
spark.sql.files.maxPartitionBytes=2gb
spark.executor.memoryOverhead=16G
spark.scheduler.minRegisteredResourcesRatio=1.0
spark.sql.adaptive.enabled=true
spark.sql.broadcastTimeout=1200

Workload Information:
- Controller: 1 X n1-standard-16
- Driver: 1 X n1-standard-16
- Executors: 8 X n1-standard-32

Signed-off-by: Partho Sarthi <[email protected]>

Add support in core tools for Dataproc GKE L4 instances

097ed8d

Signed-off-by: Partho Sarthi <[email protected]>

parthosa added the core_tools Scope the core module (scala) label Oct 13, 2023

parthosa requested review from mattahrens, cindyyuanjiang and nartal1 October 13, 2023 23:35

parthosa self-assigned this Oct 13, 2023

parthosa changed the title ~~Add support in core tools for Dataproc GKE L4 instances~~ Generating speedup factors applicable for Dataproc GKE for L4 GPU instances Oct 13, 2023

parthosa changed the title ~~Generating speedup factors applicable for Dataproc GKE for L4 GPU instances~~ Generating speedup factors for Dataproc GKE for L4 GPU instances Oct 13, 2023

parthosa changed the title ~~Generating speedup factors for Dataproc GKE for L4 GPU instances~~ Generating speedup factors for Dataproc GKE L4 GPU instances Oct 13, 2023

parthosa mentioned this pull request Oct 13, 2023

[FEA] Add support for qualification for Dataproc GKE environment #253

Closed

mattahrens approved these changes Oct 16, 2023

View reviewed changes

cindyyuanjiang approved these changes Oct 17, 2023

View reviewed changes

parthosa merged commit a96ff2e into NVIDIA:dev Oct 17, 2023
9 checks passed

parthosa deleted the spark-rapids-tools-253-l4 branch October 17, 2023 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating speedup factors for Dataproc GKE L4 GPU instances #617

Generating speedup factors for Dataproc GKE L4 GPU instances #617

parthosa commented Oct 13, 2023 •

edited

Loading

Generating speedup factors for Dataproc GKE L4 GPU instances #617

Generating speedup factors for Dataproc GKE L4 GPU instances #617

Conversation

parthosa commented Oct 13, 2023 • edited Loading

Changes:

Metrics

Spark Config

parthosa commented Oct 13, 2023 •

edited

Loading