Skip to content

branch-4.0: [fix](nereids) clamp the merged limit of MERGE_TOP_N by the parent offset #64306#64352

Open
github-actions[bot] wants to merge 1 commit into
branch-4.0from
auto-pick-64306-branch-4.0
Open

branch-4.0: [fix](nereids) clamp the merged limit of MERGE_TOP_N by the parent offset #64306#64352
github-actions[bot] wants to merge 1 commit into
branch-4.0from
auto-pick-64306-branch-4.0

Conversation

@github-actions

Copy link
Copy Markdown
Contributor

Cherry-picked from #64306

…fset (#64306)

`MergeTopNs` (the `MERGE_TOP_N` rewrite rule) merges a parent `TopN`
into its child `TopN`
when their order keys are compatible. When the parent `TopN` carries a
non-zero `OFFSET`, the
merged limit was computed as `min(parent.limit, child.limit)`, which
ignores that the parent
offset consumes rows from the child's output. The merged `TopN`
therefore keeps too many rows
and the query returns a wrong result.

Example:

```sql
SELECT * FROM (SELECT k, v FROM t ORDER BY k LIMIT 5) s ORDER BY k LIMIT 3 OFFSET 4;
```

The inner `ORDER BY k LIMIT 5` yields 5 rows; the outer `LIMIT 3 OFFSET
4` skips 4 of them, so
only 1 row should remain. Before this PR the rule merged the two `TopN`
into `OFFSET 4 LIMIT 3`
(instead of `OFFSET 4 LIMIT 1`), so it returned 3 rows.

Fix: clamp the merged limit by `max(child.limit - parent.offset, 0)`,
the same semantics
already used by `MergeLimits.mergeLimit` for consecutive limits. The bug
only triggers when the
outer `TopN` has a non-zero offset (offset = 0 makes both formulas
equal).

The existing unit test `MergeTopNsTest.testOffset` asserted the buggy
value (`limit == 10`,
while the correct value is `9`); this PR corrects that assertion as
well.

### Release note

Fix the wrong result produced by the `MERGE_TOP_N` optimization when an
outer
`ORDER BY ... LIMIT` carries a non-zero `OFFSET` over an inner `ORDER BY
... LIMIT`.
@github-actions github-actions Bot requested a review from morningman as a code owner June 10, 2026 05:10
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen

Copy link
Copy Markdown
Contributor

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants