branch-4.1: [fix](nereids) Guard UniqueFunction in multiple filter/topn pushdown rules #62742#64336
Open
yujun777 wants to merge 1 commit into
Open
branch-4.1: [fix](nereids) Guard UniqueFunction in multiple filter/topn pushdown rules #62742#64336yujun777 wants to merge 1 commit into
yujun777 wants to merge 1 commit into
Conversation
…rules (apache#62742) Problem Summary: Several Nereids rewrite rules still moved predicates containing non-idempotent (unique) functions such as rand() / uuid() / random_bytes() across operator boundaries in ways that changed query semantics. The common root cause is that a predicate like `rand() > 0.5` has an empty input-slot set, so the `containsAll(emptySet)` / `allMatch` guards used by these rules silently returned true and allowed unsafe push down / elimination. This PR adds `containsUniqueFunction()` guards to the following rules: 1. PushDownFilterThroughRepeat: skip conjuncts with unique functions. Pushing `rand() > x` below Repeat changes which rows feed each grouping set and alters aggregate results. 2. PushDownFilterThroughWindow: skip conjuncts with unique functions. Pushing a unique predicate below a window operator re-samples the base rows and changes every window-function value. 3. PushDownFilterThroughPartitionTopN: same as Window - skip unique conjuncts in the split loop. 4. PushDownFilterThroughSetOperation: do not push volatile conjuncts below `UNION DISTINCT`, `INTERSECT`, or `EXCEPT`; only `UNION ALL` keeps the original row-to-row semantics. 5. PushDownJoinOtherCondition: keep volatile ON predicates in the join when pushing them into a single child would change evaluation from per joined pair to per input row. 6. AddProjectForVolatileExpression: when a volatile expression is repeated after rewrites such as BETWEEN expansion, materialize it via a child project so repeated references share one value instead of being re-evaluated independently. 7. InferPredicates / JoinUtils.isHashJoinCondition: add the same unique function guards to avoid deriving or classifying unsafe predicates. Fix wrong results when predicates containing rand(), uuid(), random_bytes(), or uuid_numeric() are pushed across Repeat, Window, PartitionTopN, SetOperation, or Join boundaries, and when repeated volatile expressions are materialized for reuse. --------- Co-authored-by: yujun777 <yujun777@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
Author
|
run buildall |
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
cherry-pick: #62742