Help with Cluster optimization for a 150 node Trino 457 cluster running on r6g.16xlarge #24817
soham-dasgupta
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Team, I am looking for some help with optimizing our Trino(457) cluster running on emr-7.6.0. Our main use case is to read from Glue backed by s3 , transform and write it back to S3 using CTAS. I am trying to benchmark the below cluster setup by using a CTAS query that joins two tables on have 61 billion records and the other 58 billion records
Here is coordinator config -
Here is the catalog configuration
I am trying to benchmark the cluster against this fairly complex query to find out levers that I can pull to optimize
Link to query https://pastecode.io/s/xfeai33m
Link to query plan https://pastecode.io/s/rj9nx0hh
Count of rows
stg_dim_ad_group 61153245481
fact_sa_ad_group_dly 58372216116
Beta Was this translation helpful? Give feedback.
All reactions