Skip to content

Commit 6bde48c

Browse files
AveryQi115cloud-fan
authored andcommitted
[SPARK-49646][SQL] add spark config for fixing subquery decorrelation for union/set operations when parentOuterReferences has references not covered in collectedChildOuterReferences
### What changes were proposed in this pull request? Spark config added for this change: apache#48109 ### Why are the changes needed? For safer backports ### Does this PR introduce _any_ user-facing change? yes, adds a user-facing config `spark.sql.optimizer.decorrelateUnionOrSetOpUnderLimit.enabled`. Set it to true will enable decorrelating subqueries having correlated references under Union/Set operators which are under Limit operators. It is by default true, setting it to false make spark reverting to incorrect legacy behavior which raises exceptions when decorrelate the above query patterns. ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#49536 from AveryQi115/SPARK-49646-2. Lead-authored-by: Avery Qi <[email protected]> Co-authored-by: Avery <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent 6658846 commit 6bde48c

File tree

2 files changed

+18
-1
lines changed

2 files changed

+18
-1
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1064,7 +1064,14 @@ object DecorrelateInnerQuery extends PredicateHelper {
10641064
// Project, they could get added at the beginning or the end of the output columns
10651065
// depending on the child plan.
10661066
// The inner expressions for the domain are the values of newOuterReferenceMap.
1067-
val domainProjections = newOuterReferences.map(newOuterReferenceMap(_))
1067+
val domainProjections =
1068+
if (SQLConf.get.getConf(
1069+
SQLConf.DECORRELATE_UNION_OR_SET_OP_UNDER_LIMIT_ENABLED
1070+
)) {
1071+
newOuterReferences.map(newOuterReferenceMap(_))
1072+
} else {
1073+
collectedChildOuterReferences.map(newOuterReferenceMap(_))
1074+
}
10681075
val newChild = Project(child.output ++ domainProjections, decorrelatedChild)
10691076
(newChild, newJoinCond, newOuterReferenceMap)
10701077
}

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3998,6 +3998,16 @@ object SQLConf {
39983998
.booleanConf
39993999
.createWithDefault(true)
40004000

4001+
val DECORRELATE_UNION_OR_SET_OP_UNDER_LIMIT_ENABLED =
4002+
buildConf("spark.sql.optimizer.decorrelateUnionOrSetOpUnderLimit.enabled")
4003+
.internal()
4004+
.doc("Decorrelate UNION or SET operation under LIMIT operator. If not enabled," +
4005+
"revert to legacy incorrect behavior for certain subqueries with correlation under" +
4006+
"UNION/SET operator with a LIMIT operator above it.")
4007+
.version("4.0.0")
4008+
.booleanConf
4009+
.createWithDefault(true)
4010+
40014011
val DECORRELATE_EXISTS_IN_SUBQUERY_LEGACY_INCORRECT_COUNT_HANDLING_ENABLED =
40024012
buildConf("spark.sql.optimizer.decorrelateExistsSubqueryLegacyIncorrectCountHandling.enabled")
40034013
.internal()

0 commit comments

Comments
 (0)