Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating speedup factors for Dataproc GKE L4 GPU instances #617

Merged
merged 1 commit into from
Oct 17, 2023

Conversation

parthosa
Copy link
Collaborator

@parthosa parthosa commented Oct 13, 2023

Contributes to #253. This PR adds support for Dataproc GKE with L4 instances in core tools.

Changes:

Metrics

  • Actual GPU Speed Up : 3.74
  • Estimated Speed Up. : 3.09
  • Error : -21.04%

Spark Config

GPU:

spark.kubernetes.container.image=knobby/spark-rapids:latest 
spark.executor.resource.gpu.vendor=nvidia.com 
spark.executor.extraClassPath=/opt/spark/jars/rapids-4-spark_2.12-23.10.0-20230828.131939-24-cuda11.jar 
spark.driver.extraClassPath=/opt/spark/jars/rapids-4-spark_2.12-23.10.0-20230828.131939-24-cuda11.jar 
spark.executor.resource.gpu.amount=1 
spark.executor.resource.gpu.discoveryScript=/opt/spark/getGpusResources.sh 
spark.plugins=com.nvidia.spark.SQLPlugin 
spark.dynamicAllocation.enabled=false 
spark.task.resource.gpu.amount=0.0625 
spark.driver.maxResultSize=2GB 
spark.driver.memory=50G 
spark.executor.cores=16 
spark.executor.instances=8 
spark.rapids.sql.batchSizeBytes=1GB 
spark.locality.wait=0 
spark.rapids.sql.concurrentGpuTasks=2 
spark.executor.memory=16G 
spark.sql.files.maxPartitionBytes=2gb 
spark.rapids.memory.host.spillStorageSize=32G 
spark.rapids.memory.pinnedPool.size=8g 
spark.executor.memoryOverhead=16G 
spark.scheduler.minRegisteredResourcesRatio=1.0 
spark.rapids.shuffle.multiThreaded.writer.threads=16 
spark.rapids.shuffle.multiThreaded.reader.threads=16 
spark.rapids.shuffle.mode=MULTITHREADED
  • Workload Information:
    • Controller: 1 X n1-standard-16
    • Driver: 1 X n1-standard-16
    • Executors: 8 X n1-standard-32, (with 8 X L4 24GB)

CPU:

spark.kubernetes.container.image=knobby/spark-rapids:latest
spark.dynamicAllocation.enabled=false
spark.driver.maxResultSize=2GB
spark.driver.memory=50G
spark.executor.cores=16
spark.executor.instances=8
spark.locality.wait=0
spark.executor.memory=16G
spark.sql.files.maxPartitionBytes=2gb
spark.executor.memoryOverhead=16G
spark.scheduler.minRegisteredResourcesRatio=1.0
spark.sql.adaptive.enabled=true
spark.sql.broadcastTimeout=1200
  • Workload Information:
    • Controller: 1 X n1-standard-16
    • Driver: 1 X n1-standard-16
    • Executors: 8 X n1-standard-32

@parthosa parthosa added the core_tools Scope the core module (scala) label Oct 13, 2023
@parthosa parthosa self-assigned this Oct 13, 2023
@parthosa parthosa changed the title Add support in core tools for Dataproc GKE L4 instances Generating speedup factors applicable for Dataproc GKE for L4 GPU instances Oct 13, 2023
@parthosa parthosa changed the title Generating speedup factors applicable for Dataproc GKE for L4 GPU instances Generating speedup factors for Dataproc GKE for L4 GPU instances Oct 13, 2023
@parthosa parthosa changed the title Generating speedup factors for Dataproc GKE for L4 GPU instances Generating speedup factors for Dataproc GKE L4 GPU instances Oct 13, 2023
@parthosa parthosa merged commit a96ff2e into NVIDIA:dev Oct 17, 2023
9 checks passed
@parthosa parthosa deleted the spark-rapids-tools-253-l4 branch October 17, 2023 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants