Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[batch] reduce CPU use of delete_prev_..._resource_records (#14366)
I assessed three query options, but keep in mind that the database is currently at nearly 100% CPU so these timings are high variance. I'm honestly pretty surprised that the first one isn't the fastest. It seems to me like the obvious SQL query and yet it does not perform as well as those using lateral joins. Option 2 and Option 3 both use lateral joins. Option 2 uses a LEFT JOIN followed by a `WHERE ... IS NOT NULL` which is definitionally an `INNER JOIN`. For that reason, I also explored Option 3, directly using `INNER JOIN`, which appears to be very slightly faster and lower variance. | Option | Timing (mean + stddev, in seconds) | |--------|------------------------------------| | 1 | 9.92 +- 2.02 | | 2 | 5.35 +- 1.87 | | 3 | 5.11 +- 1.15 | ## Option 1 ``` SELECT DISTINCT group_resources.batch_id, group_resources.update_id, group_resources.job_group_id FROM job_group_inst_coll_cancellable_resources AS group_resources INNER JOIN job_group_self_and_ancestors AS descendant ON descendant.batch_id = group_resources.batch_id AND descendant.job_group_id = group_resources.job_group_id INNER JOIN job_groups_cancelled AS cancelled ON descendant.batch_id = cancelled.id AND descendant.ancestor_id = cancelled.job_group_id ORDER BY group_resources.batch_id desc, group_resources.update_id desc, group_resources.job_group_id desc LIMIT 1000; ``` ``` 1000 rows in set (11.81 sec) 1000 rows in set (8.91 sec) 1000 rows in set (11.86 sec) 1000 rows in set (7.08 sec) ``` ## Option 2 ``` SELECT DISTINCT group_resources.batch_id, group_resources.update_id, group_resources.job_group_id FROM job_group_inst_coll_cancellable_resources AS group_resources LEFT JOIN LATERAL ( SELECT 1 AS cancelled FROM job_group_self_and_ancestors AS descendant INNER JOIN job_groups_cancelled AS cancelled ON descendant.batch_id = cancelled.id AND descendant.ancestor_id = cancelled.job_group_id WHERE descendant.batch_id = group_resources.batch_id AND descendant.job_group_id = group_resources.job_group_id ) AS t ON TRUE WHERE t.cancelled IS NOT NULL ORDER BY group_resources.batch_id desc, group_resources.update_id desc, group_resources.job_group_id desc LIMIT 1000; ``` ``` 1000 rows in set (3.68 sec) 1000 rows in set (5.24 sec) 1000 rows in set (8.43 sec) 1000 rows in set (4.03 sec) ``` ## Option 3 ``` SELECT DISTINCT group_resources.batch_id, group_resources.update_id, group_resources.job_group_id FROM job_group_inst_coll_cancellable_resources AS group_resources INNER JOIN LATERAL ( SELECT 1 FROM job_group_self_and_ancestors AS descendant INNER JOIN job_groups_cancelled AS cancelled ON descendant.batch_id = cancelled.id AND descendant.ancestor_id = cancelled.job_group_id WHERE descendant.batch_id = group_resources.batch_id AND descendant.job_group_id = group_resources.job_group_id ) AS t ON TRUE ORDER BY group_resources.batch_id desc, group_resources.update_id desc, group_resources.job_group_id desc LIMIT 1000; ``` ``` 1000 rows in set (6.74 sec) 1000 rows in set (5.63 sec) 1000 rows in set (3.86 sec) 1000 rows in set (4.20 sec) ```
- Loading branch information