Skip to content

[SPARK-52187] Introduce Join pushdown for DSv2 #50921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

PetarVasiljevic-DB
Copy link
Contributor

@PetarVasiljevic-DB PetarVasiljevic-DB commented May 16, 2025

What changes were proposed in this pull request?

With this PR I am introducing the Join pushdown interface for DSv2 connectors and it's implementation for JDBC connectors.

The interface itself, SupportsPushDownJoin has the following API:

public interface SupportsPushDownJoin extends ScanBuilder {
    boolean isRightSideCompatibleForJoin(SupportsPushDownJoin other);

    boolean pushJoin(
            SupportsPushDownJoin other,
            JoinType joinType,
            Optional<Predicate> condition,
            StructType leftRequiredSchema,
            StructType rightRequiredSchema
    );
}

If isRightSideCompatibleForJoin is true, then the join will be tried to be pushed down (it can still fail though).

With this implementation, only Inner joins are supported. Left and Right joins should be added as well. Cross joins won't be supported since they can increase the amount of data that is being read.

Also, none of the dialects currently supports the join push down. It is only available for H2 dialect. The join push down capability is guarded by SQLConf spark.sql.optimizer.datasourceV2JoinPushdown, JDBC option pushDownJoin and JDBC dialect method supportsJoin.

Why are the changes needed?

DSv2 connectors can't push down the join operator.

Does this PR introduce any user-facing change?

This PR itself no since the behaviour is not implemented for any of the connectors (besides H2 which is testing JDBC dialect).

How was this patch tested?

New tests and some local testing with TPCDS queries.

Was this patch authored or co-authored using generative AI tooling?

@github-actions github-actions bot added the SQL label May 16, 2025
@PetarVasiljevic-DB PetarVasiljevic-DB changed the title introduce join pushdown for dsv2 [SPARK-52187] Introduce Join pushdown for DSv2 May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant