[SPARK-52187] Introduce Join pushdown for DSv2 #50921
Open
+840
−18
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
With this PR I am introducing the Join pushdown interface for DSv2 connectors and it's implementation for JDBC connectors.
The interface itself,
SupportsPushDownJoin
has the following API:If
isRightSideCompatibleForJoin
is true, then the join will be tried to be pushed down (it can still fail though).With this implementation, only Inner joins are supported. Left and Right joins should be added as well. Cross joins won't be supported since they can increase the amount of data that is being read.
Also, none of the dialects currently supports the join push down. It is only available for H2 dialect. The join push down capability is guarded by SQLConf
spark.sql.optimizer.datasourceV2JoinPushdown
, JDBC optionpushDownJoin
and JDBC dialect methodsupportsJoin
.Why are the changes needed?
DSv2 connectors can't push down the join operator.
Does this PR introduce any user-facing change?
This PR itself no since the behaviour is not implemented for any of the connectors (besides H2 which is testing JDBC dialect).
How was this patch tested?
New tests and some local testing with TPCDS queries.
Was this patch authored or co-authored using generative AI tooling?