-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[batch] reduce CPU use of delete_prev_..._resource_records #14366
Conversation
I assessed three query options, but keep in mind that the database is currently at nearly 100% CPU so these timings are high variance. I'm honestly pretty surprised that the first one isn't the fastest. It seems to me like the obvious SQL query and yet it does not perform as well as those using lateral joins. Option 2 and Option 3 both use lateral joins. Option 2 uses a LEFT JOIN followed by a `WHERE ... IS NOT NULL` which is definitionally an `INNER JOIN`. For that reason, I also explored Option 3, directly using `INNER JOIN`, which appears to be very slightly faster and lower variance. | Option | Timing (mean + stddev, in seconds) | |--------|------------------------------------| | 1 | 9.92 +- 2.02 | | 2 | 5.35 +- 1.87 | | 3 | 5.11 +- 1.15 | ## Option 1 ``` SELECT DISTINCT group_resources.batch_id, group_resources.update_id, group_resources.job_group_id FROM job_group_inst_coll_cancellable_resources AS group_resources INNER JOIN job_group_self_and_ancestors AS descendant ON descendant.batch_id = group_resources.batch_id AND descendant.job_group_id = group_resources.job_group_id INNER JOIN job_groups_cancelled AS cancelled ON descendant.batch_id = cancelled.id AND descendant.ancestor_id = cancelled.job_group_id ORDER BY group_resources.batch_id desc, group_resources.update_id desc, group_resources.job_group_id desc LIMIT 1000; ``` ``` 1000 rows in set (11.81 sec) 1000 rows in set (8.91 sec) 1000 rows in set (11.86 sec) 1000 rows in set (7.08 sec) ``` ## Option 2 ``` SELECT DISTINCT group_resources.batch_id, group_resources.update_id, group_resources.job_group_id FROM job_group_inst_coll_cancellable_resources AS group_resources LEFT JOIN LATERAL ( SELECT 1 AS cancelled FROM job_group_self_and_ancestors AS descendant INNER JOIN job_groups_cancelled AS cancelled ON descendant.batch_id = cancelled.id AND descendant.ancestor_id = cancelled.job_group_id WHERE descendant.batch_id = group_resources.batch_id AND descendant.job_group_id = group_resources.job_group_id ) AS t ON TRUE WHERE t.cancelled IS NOT NULL ORDER BY group_resources.batch_id desc, group_resources.update_id desc, group_resources.job_group_id desc LIMIT 1000; ``` ``` 1000 rows in set (3.68 sec) 1000 rows in set (5.24 sec) 1000 rows in set (8.43 sec) 1000 rows in set (4.03 sec) ``` ## Option 3 ``` SELECT DISTINCT group_resources.batch_id, group_resources.update_id, group_resources.job_group_id FROM job_group_inst_coll_cancellable_resources AS group_resources INNER JOIN LATERAL ( SELECT 1 FROM job_group_self_and_ancestors AS descendant INNER JOIN job_groups_cancelled AS cancelled ON descendant.batch_id = cancelled.id AND descendant.ancestor_id = cancelled.job_group_id WHERE descendant.batch_id = group_resources.batch_id AND descendant.job_group_id = group_resources.job_group_id ) AS t ON TRUE ORDER BY group_resources.batch_id desc, group_resources.update_id desc, group_resources.job_group_id desc LIMIT 1000; ``` ``` 1000 rows in set (6.74 sec) 1000 rows in set (5.63 sec) 1000 rows in set (3.86 sec) 1000 rows in set (4.20 sec) ```
32aa19b
to
27d1fad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I also find the first query far easier to read than the lateral joins, but this seems the right change to make for now.
Following up on this. I can't measure a difference between 1 and 3 and both execution plans are identical. In prod, both queries return empty. |
If the query plans are identical I’d expect identical performance. Maybe the DB version was incremented and the optimizer is better? These queries are only interesting when the system is under enough load to have “cancellable resources” i.e. running jobs. |
You would think, except for #14623... ideally you would be correct |
Thanks for your reply! Maybe/I hope so? IMO the |
We're now using |
I do not recall but I think it was after the upgrade to major version 8 |
I assessed three query options, but keep in mind that the database is currently at nearly 100% CPU so these timings are high variance. I'm honestly pretty surprised that the first one isn't the fastest. It seems to me like the obvious SQL query and yet it does not perform as well as those using lateral joins.
Option 2 and Option 3 both use lateral joins. Option 2 uses a LEFT JOIN followed by a
WHERE ... IS NOT NULL
which is definitionally anINNER JOIN
. For that reason, I also explored Option 3, directly usingINNER JOIN
, which appears to be very slightly faster and lower variance.Option 1
Option 2
Option 3