Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorted dict #14

Draft
wants to merge 80 commits into
base: main
Choose a base branch
from
Draft

Sorted dict #14

wants to merge 80 commits into from

Conversation

amascolo
Copy link
Collaborator

@amascolo amascolo commented Oct 27, 2024

Support for sorting and hybrid queries (i.e. using SortedDict and Range). Includes all queries ported over to SDQL.

@amascolo amascolo self-assigned this Oct 27, 2024
@amascolo
Copy link
Collaborator Author

Details that came up during implementation:

  1. added a Timer expression in SDQL to mark where the timer should start – since we need to exclude the initial data sorting time in benchmarks (nice aside is that now we don't need to treat load expressions as a special case)

  2. added external function SortedIndices - to sort the initial data

  3. added external function SortedVec - to sort the tries

  4. special case to call emplace_back - when constructing @vec { <...> -> 1 } we interpret the relational form <...> -> 1 as i -> <...> and append elements to a std::vec

  5. special case to call at - since in SortedDict we can't use [ ] which has special semantics for insertion (this wasn't an issue previously for vecdict since after construction we only iterated on it – i.e. set semantics – whereas here we access elements)

  6. cache the last found element – to avoid calling SortedDict::find twice in situations where we check with contains then retrieve the element using at (in handwritten C++ code we could assign the result of find – i.e. an iterator - to some variable, but this isn't representable in SDQL)

@amascolo
Copy link
Collaborator Author

Have ported query FJ 3a to SDQL and checked its performance is identical to handwritten C++.

Codegen and runtime should work everywhere, just need to generate all the other queries in SDQL.

@amascolo
Copy link
Collaborator Author

@amirsh this PR is now ready to merge, subject to your approval:

  • Supports all sort-based and hybrid JOB queries in SDQL ✅
  • Performance within 2% of C++ (no regression on hash-based queries)

@amascolo amascolo requested a review from amirsh November 28, 2024 16:19
@amascolo amascolo marked this pull request as draft November 29, 2024 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants