-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Implement nested join optimization #3843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We could also look at DuckDB join reordering: https://www.youtube.com/watch?v=aNRoR0Z3SzU I filed a duplicate issue before I saw this one, although mine is specifically for the logical plan. #3984 |
Other resources: The MonetDB Architecture Martin Kersten CWI - https://homepages.cwi.nl/~manegold/teaching/adt/lectures/lecture2.pdf |
Hi, is there any progress? |
Hi, I've been doing some reading on the side and I am interested into taking a stab at this if the issue is still open and no one is working on it. |
take |
Small update; I started looking at the initial join selection rule implemented by backtracking from the reference to I've also started revisiting the no-statistics approach of DuckDB to try and get a better intuition for how their cardinality estimator approach works, implementation wise I think I want to get a small set of queries with nested joins running to get a better view of the current way they are handled before I draft an early implementation. |
Hi @clflushopt In my opinion getting Cardinality Estimation / interval analysis is higher priority as that feature has many use cases and is needed for almost any more sophisticated join ordering approach. In terms of a more sophisticated join ordering approach, I have some ideas about this which I will try and write up over the nect week or two |
Hey @alamb thanks for the clear answer, yes that sounds good ! It's seems that both ticket for interval boundary and selectivity analysis for |
Thanks @clflushopt I don't have a great handle in my head on the current state of Boundary and Selectivity anaylsis. Maybe your first PRs could focus on adding some docs and examples to get warmed up and then we can start adding features (like for AND/OR) |
Hey @alamb I have a small change in #14688 to demo boundary analysis (as I understand from the existing code), If this looks like a suitable initial example I can add one that demonstrates how |
This sounds amazing -- I look forward to it
Likewise amazing
I recommend somewhere in https://datafusion.apache.org/library-user-guide/index.html |
Hey @alamb following up on #14688 I made a new pull request in #14735 to add an example that demonstrates how analysis works for |
BTW I wrote some thoughts on join ordering in a blog post (part 2) |
|
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
For complex queries, like those in TCP-H and TCP-DS it is essential to find a good Join order.
HashBuildProbeOrder
implements a rule to optimize the probe / build side of joins, but this only optimizes the joins locally (e.g. swapping the join left / right).We should implement an algorithm that (tries to) find a (close to) global optimum based on the total estimated cost of the joins.
Describe the solution you'd like
Implement an efficient algorithm for optimizing
I'm not sure what the SOTA is on this. Some material I found with some Googling:
https://db.in.tum.de/teaching/ws1415/queryopt/chapter3.pdf
https://db.in.tum.de/~radke/papers/hugejoins.pdf
https://www.cockroachlabs.com/blog/join-ordering-pt1/
https://www.cockroachlabs.com/blog/join-ordering-ii-the-ikkbz-algorithm/
http://mlwiki.org/index.php/Join_Ordering
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: