-
Notifications
You must be signed in to change notification settings - Fork 1.5k
General framework to decorrelate the subqueries #5492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
According to the discussions in this issue, i think we can list the following items to support a subqueries decorrelation framework:
|
|
I think a crucial starting point is to transform all correlated exprs into |
In case others haven't heard, @irenjj is working on additional subquery support as part of a Google Summer of Code Project (where @jayzhan211 and I are helping mentor). Perhaps @suibianwanwank and @duongcongtoai would be interested in being involved too |
cool, actually in this PR: #16016 i'm trying to introduce a unified structure to do decorrelation for the following usecases:
The parts that are not implemented are left with unimplemented! errors. I'm trying to make the PR mergable, so everyone can continue to contribute in parallel, but my current goal is to implement enough to satisfy existing slt tests |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In the current DataFusion, it has very limited support for correlated subqueries. It can only decorrelate the (NOT) IN/Exists predicate subqueries to Semi/Anti Joins. Even in the simplest IN/Exists cases, if the correlated expressions are not in the Filter/Join conditions, the current decorrelate rules will not support them.
In the paper "Unnesting Arbitrary Queries" by T. Neumann; A. Kemper
(http://www.btw-2015.de/res/proceedings/Hauptband/Wiss/Neumann-Unnesting_Arbitrary_Querie.pdf). It raise a mechanism to unnest arbitrary queries. This was already implemented by the Hyper DB:
For example:
select * from orders
where 1 in (select 1 from part left join (select l_partkey from lineitem where o_orderkey = 2) lineitem on p_partkey = lineitem.l_partkey)
https://hyper-db.de/interface.html#
Both SparkSQL and PostgreSQL do not support decorrelate such kind of queries.
Describe the solution you'd like
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: