Skip to content

Conversation

@SeungjinYang
Copy link
Collaborator

Separated out from #7602

The database is given an index on created_at. This is because several queries in requests.py have ORDER BY created_at statements, which can be accelerated by an index.

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

@SeungjinYang SeungjinYang marked this pull request as draft October 16, 2025 19:11
@SeungjinYang
Copy link
Collaborator Author

SeungjinYang commented Oct 16, 2025

Another interesting thing we can do in lieu of adding an index is, given that the request ID is uuid7 now (see #7639), sort by request ID which already has an index.

The pro of this alternate approach is we get to only maintain one index instead of two indices.
The con of this alternate approach is we lose out a bit on precision - python time.time() which is used to generate created_at has microsecond precision on linux, whereas uuid7 timestamp only provides milisecond level precision.

edit: The PR referenced got reverted

@SeungjinYang SeungjinYang marked this pull request as ready for review October 16, 2025 19:18
@SeungjinYang SeungjinYang force-pushed the request-db-optimizations-1 branch from 4917f3b to fe88f4b Compare October 16, 2025 19:19
Copy link
Collaborator

@cg505 cg505 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the index also include other fields we filter by? e.g. name, status

@SeungjinYang
Copy link
Collaborator Author

should the index also include other fields we filter by? e.g. name, status

That's a good point. created_at is a no brainer because of ORDER BYs, but we can also index columns used in WHERE clauses.

I took account of what columns we filter most by and found:

  • include_request_names / exclude_request_names is specified in RequestTaskFilter a lot, so we can index on name
  • cluster_names is filtered on quite a bit as well, so we can index on cluster_name column as well.
  • status is also a crowd favorite for filtering as well

So I'm thinking we add index for name and cluster_name, and status column.

@SeungjinYang SeungjinYang changed the title [db] Add index on "created_at" column in requests db [db] Add indices on requests db columns Oct 16, 2025
@SeungjinYang SeungjinYang requested a review from cg505 October 16, 2025 21:30
Copy link
Collaborator

@cg505 cg505 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

@SeungjinYang
Copy link
Collaborator Author

sqlite> PRAGMA index_list(requests);
0|status_idx|0|c|0
1|name_idx|0|c|0
2|cluster_name_idx|0|c|0
3|created_at_idx|0|c|0
4|sqlite_autoindex_requests_1|1|pk|0

@SeungjinYang SeungjinYang enabled auto-merge (squash) October 16, 2025 21:59
@SeungjinYang SeungjinYang disabled auto-merge October 16, 2025 22:05
@SeungjinYang SeungjinYang force-pushed the request-db-optimizations-1 branch from 49e76e3 to 2ad503b Compare October 16, 2025 22:26
@SeungjinYang SeungjinYang enabled auto-merge (squash) October 16, 2025 22:41
@SeungjinYang SeungjinYang disabled auto-merge October 16, 2025 22:54
@SeungjinYang SeungjinYang force-pushed the request-db-optimizations-1 branch from cb78b6d to 8ff9489 Compare October 16, 2025 23:05
@SeungjinYang SeungjinYang marked this pull request as draft October 16, 2025 23:17
@SeungjinYang SeungjinYang marked this pull request as ready for review October 16, 2025 23:32
@SeungjinYang SeungjinYang force-pushed the request-db-optimizations-1 branch 2 times, most recently from 387b30e to 0dbeae3 Compare October 17, 2025 00:51
@SeungjinYang SeungjinYang force-pushed the request-db-optimizations-1 branch from 2ddaed2 to 2727ca2 Compare October 17, 2025 06:54
@SeungjinYang SeungjinYang enabled auto-merge (squash) October 17, 2025 06:57
@SeungjinYang SeungjinYang disabled auto-merge October 17, 2025 07:14
@SeungjinYang SeungjinYang force-pushed the request-db-optimizations-1 branch from 4e3f229 to 3256d43 Compare October 17, 2025 07:16
@SeungjinYang SeungjinYang enabled auto-merge (squash) October 17, 2025 07:25
@SeungjinYang SeungjinYang merged commit 33025aa into master Oct 17, 2025
20 checks passed
@SeungjinYang SeungjinYang deleted the request-db-optimizations-1 branch October 17, 2025 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants