From a325690473803cfaf5858c67f9e32e4b5413af85 Mon Sep 17 00:00:00 2001 From: Dan King Date: Tue, 27 Feb 2024 17:18:18 -0500 Subject: [PATCH] [batch] reduce CPU use of delete_prev_..._resource_records (#14366) I assessed three query options, but keep in mind that the database is currently at nearly 100% CPU so these timings are high variance. I'm honestly pretty surprised that the first one isn't the fastest. It seems to me like the obvious SQL query and yet it does not perform as well as those using lateral joins. Option 2 and Option 3 both use lateral joins. Option 2 uses a LEFT JOIN followed by a `WHERE ... IS NOT NULL` which is definitionally an `INNER JOIN`. For that reason, I also explored Option 3, directly using `INNER JOIN`, which appears to be very slightly faster and lower variance. | Option | Timing (mean + stddev, in seconds) | |--------|------------------------------------| | 1 | 9.92 +- 2.02 | | 2 | 5.35 +- 1.87 | | 3 | 5.11 +- 1.15 | ## Option 1 ``` SELECT DISTINCT group_resources.batch_id, group_resources.update_id, group_resources.job_group_id FROM job_group_inst_coll_cancellable_resources AS group_resources INNER JOIN job_group_self_and_ancestors AS descendant ON descendant.batch_id = group_resources.batch_id AND descendant.job_group_id = group_resources.job_group_id INNER JOIN job_groups_cancelled AS cancelled ON descendant.batch_id = cancelled.id AND descendant.ancestor_id = cancelled.job_group_id ORDER BY group_resources.batch_id desc, group_resources.update_id desc, group_resources.job_group_id desc LIMIT 1000; ``` ``` 1000 rows in set (11.81 sec) 1000 rows in set (8.91 sec) 1000 rows in set (11.86 sec) 1000 rows in set (7.08 sec) ``` ## Option 2 ``` SELECT DISTINCT group_resources.batch_id, group_resources.update_id, group_resources.job_group_id FROM job_group_inst_coll_cancellable_resources AS group_resources LEFT JOIN LATERAL ( SELECT 1 AS cancelled FROM job_group_self_and_ancestors AS descendant INNER JOIN job_groups_cancelled AS cancelled ON descendant.batch_id = cancelled.id AND descendant.ancestor_id = cancelled.job_group_id WHERE descendant.batch_id = group_resources.batch_id AND descendant.job_group_id = group_resources.job_group_id ) AS t ON TRUE WHERE t.cancelled IS NOT NULL ORDER BY group_resources.batch_id desc, group_resources.update_id desc, group_resources.job_group_id desc LIMIT 1000; ``` ``` 1000 rows in set (3.68 sec) 1000 rows in set (5.24 sec) 1000 rows in set (8.43 sec) 1000 rows in set (4.03 sec) ``` ## Option 3 ``` SELECT DISTINCT group_resources.batch_id, group_resources.update_id, group_resources.job_group_id FROM job_group_inst_coll_cancellable_resources AS group_resources INNER JOIN LATERAL ( SELECT 1 FROM job_group_self_and_ancestors AS descendant INNER JOIN job_groups_cancelled AS cancelled ON descendant.batch_id = cancelled.id AND descendant.ancestor_id = cancelled.job_group_id WHERE descendant.batch_id = group_resources.batch_id AND descendant.job_group_id = group_resources.job_group_id ) AS t ON TRUE ORDER BY group_resources.batch_id desc, group_resources.update_id desc, group_resources.job_group_id desc LIMIT 1000; ``` ``` 1000 rows in set (6.74 sec) 1000 rows in set (5.63 sec) 1000 rows in set (3.86 sec) 1000 rows in set (4.20 sec) ``` --- batch/batch/driver/main.py | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/batch/batch/driver/main.py b/batch/batch/driver/main.py index 8965f5c528c..5499ac861d6 100644 --- a/batch/batch/driver/main.py +++ b/batch/batch/driver/main.py @@ -1461,21 +1461,22 @@ async def delete_committed_job_groups_inst_coll_staging_records(db: Database): async def delete_prev_cancelled_job_group_cancellable_resources_records(db: Database): targets = db.execute_and_fetchall( """ -SELECT job_group_inst_coll_cancellable_resources.batch_id, - job_group_inst_coll_cancellable_resources.update_id, - job_group_inst_coll_cancellable_resources.job_group_id -FROM job_group_inst_coll_cancellable_resources -LEFT JOIN LATERAL ( - SELECT 1 AS cancelled - FROM job_group_self_and_ancestors - INNER JOIN job_groups_cancelled - ON job_group_self_and_ancestors.batch_id = job_groups_cancelled.id AND - job_group_self_and_ancestors.ancestor_id = job_groups_cancelled.job_group_id - WHERE job_group_inst_coll_cancellable_resources.batch_id = job_group_self_and_ancestors.batch_id AND - job_group_inst_coll_cancellable_resources.job_group_id = job_group_self_and_ancestors.job_group_id +SELECT DISTINCT + group_resources.batch_id, + group_resources.update_id, + group_resources.job_group_id +FROM job_group_inst_coll_cancellable_resources AS group_resources +INNER JOIN LATERAL ( + SELECT + 1 + FROM job_group_self_and_ancestors AS descendant + INNER JOIN job_groups_cancelled AS cancelled + ON descendant.batch_id = cancelled.id + AND descendant.ancestor_id = cancelled.job_group_id + WHERE descendant.batch_id = group_resources.batch_id + AND descendant.job_group_id = group_resources.job_group_id ) AS t ON TRUE -WHERE t.cancelled IS NOT NULL -GROUP BY job_group_inst_coll_cancellable_resources.batch_id, job_group_inst_coll_cancellable_resources.update_id, job_group_inst_coll_cancellable_resources.job_group_id +ORDER BY group_resources.batch_id desc, group_resources.update_id desc, group_resources.job_group_id desc LIMIT 1000; """, query_name='find_cancelled_cancellable_resources_records_to_delete',