Skip to content

branch-4.1: [fix](nereids) Guard UniqueFunction in multiple filter/topn pushdown rules #62742#64336

Open
yujun777 wants to merge 1 commit into
apache:branch-4.1from
yujun777:pick-pr-62742-branch-4.1
Open

branch-4.1: [fix](nereids) Guard UniqueFunction in multiple filter/topn pushdown rules #62742#64336
yujun777 wants to merge 1 commit into
apache:branch-4.1from
yujun777:pick-pr-62742-branch-4.1

Conversation

@yujun777

Copy link
Copy Markdown
Contributor

cherry-pick: #62742

…rules (apache#62742)

Problem Summary:

Several Nereids rewrite rules still moved predicates
containing non-idempotent (unique) functions such as rand() / uuid() /
random_bytes() across operator boundaries in ways that changed query
semantics. The common root cause is that a predicate like `rand() > 0.5`
has an empty input-slot set, so the `containsAll(emptySet)` / `allMatch`
guards used by these rules silently returned true and allowed unsafe
push
down / elimination.

This PR adds `containsUniqueFunction()` guards to the following rules:

1. PushDownFilterThroughRepeat: skip conjuncts with unique functions.
   Pushing `rand() > x` below Repeat changes which rows feed each
   grouping set and alters aggregate results.

2. PushDownFilterThroughWindow: skip conjuncts with unique functions.
   Pushing a unique predicate below a window operator re-samples the
   base rows and changes every window-function value.

3. PushDownFilterThroughPartitionTopN: same as Window - skip unique
   conjuncts in the split loop.

4. PushDownFilterThroughSetOperation: do not push volatile conjuncts
   below `UNION DISTINCT`, `INTERSECT`, or `EXCEPT`; only `UNION ALL`
   keeps the original row-to-row semantics.

5. PushDownJoinOtherCondition: keep volatile ON predicates in the join
   when pushing them into a single child would change evaluation from
   per joined pair to per input row.

6. AddProjectForVolatileExpression: when a volatile expression is
   repeated after rewrites such as BETWEEN expansion, materialize it via
   a child project so repeated references share one value instead of
   being re-evaluated independently.

7. InferPredicates / JoinUtils.isHashJoinCondition: add the same unique
   function guards to avoid deriving or classifying unsafe predicates.

Fix wrong results when predicates containing rand(), uuid(),
random_bytes(), or uuid_numeric() are pushed across Repeat, Window,
PartitionTopN, SetOperation, or Join boundaries, and when repeated
volatile expressions are materialized for reuse.

---------

Co-authored-by: yujun777 <yujun777@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@yujun777 yujun777 requested a review from yiguolei as a code owner June 10, 2026 01:28
@yujun777

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants