[fix](fe) Fix VSlotRef invalid slot id in INTERSECT/EXCEPT with ExprId reuse#62296
[fix](fe) Fix VSlotRef invalid slot id in INTERSECT/EXCEPT with ExprId reuse#62296yujun777 wants to merge 1 commit intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
Findings
- High - The fix is incomplete because the same ExprId->SlotRef overwrite pattern still exists in the functionally parallel recursive CTE translator path (
visitPhysicalRecursiveUnion()), so recursive unions can still bind child result expressions to the wrong slots when ExprIds are reused.
Critical checkpoints
- Goal of current task: Partially achieved.
visitPhysicalSetOperation()is fixed and covered by regression tests, but the same translator invariant is still violated invisitPhysicalRecursiveUnion(). - Modification size/focus: Yes, the patch is small and focused.
- Concurrency: Not involved.
- Special lifecycle/static init: No special lifecycle concerns in the changed code.
- Configuration: None added.
- Compatibility: No FE/BE protocol or storage compatibility change.
- Parallel code paths: Not fully covered.
visitPhysicalRecursiveUnion()is a parallel path that still uses the old translation order. - Special conditional checks: The new ordering comment is clear; no extra conditional-risk issue found.
- Test coverage: Added good regression coverage for non-recursive set operations, but there is no recursive CTE regression for the same ExprId-reuse invariant.
- Observability: No additional observability needed for this translator-only fix.
- Transaction/persistence: Not involved.
- Data writes/modifications: Not involved.
- FE/BE variable passing: Not involved.
- Performance: Acceptable; the change is still linear and does not add meaningful overhead.
- Other issues: None beyond the missed recursive CTE path.
| @Override | ||
| public PlanFragment visitPhysicalSetOperation( | ||
| PhysicalSetOperation setOperation, PlanTranslatorContext context) { | ||
| // Translate each child's result exprs and distribute exprs immediately after |
There was a problem hiding this comment.
High: this only fixes visitPhysicalSetOperation(), but visitPhysicalRecursiveUnion() still follows the old order: visit both children, call generateTupleDesc(recursiveCte.getOutput(), ...), then translate recursiveCte.getRegularChildrenOutputs() (see lines 2302-2328 in the current file). That path uses the same global PlanTranslatorContext.exprIdToSlotRef map, so later child visits and the parent tuple creation still overwrite earlier mappings. Because LogicalRecursiveUnion.buildNewOutputs() can preserve an existing output ExprId, a recursive CTE with reused ExprIds can still translate to the wrong SlotRef and hit the same invalid-slot failure. Please apply the same interleaving strategy here and add a regression under recursive_cte.
FE UT Coverage ReportIncrement line coverage |
…d reuse Issue Number: close #CIR-19889 Problem Summary: When the Nereids optimizer produces set operations (INTERSECT/EXCEPT/UNION DISTINCT) where ExprIds are reused between: 1. A set operation output and its child output (parent-child sharing) 2. Multiple children outputs (sibling sharing) The plan translator in visitPhysicalSetOperation() would translate children result expressions and distribute expressions AFTER all children had been visited. Since each child visit calls generateTupleDesc() which overwrites the ExprId→SlotRef mapping via HashMap.put(), later children overwrite earlier children mappings. This caused the translated result expressions to reference wrong slots, leading to BE errors: "VSlotRef have invalid slot id". The fix interleaves child visits with result/distribute expression translation, translating each child expressions immediately after visiting that child, before the next child visit can overwrite the ExprId→SlotRef mapping. Fixed a bug where INTERSECT/EXCEPT queries with CTEs and expressions could fail with "VSlotRef have invalid slot id" error when the optimizer reuses ExprIds across set operation children. - Test: Regression test added (set_operation_exprid_reuse.groovy) - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bf4213e to
8aa64f0
Compare
|
run buildall |
|
/review |
|
Re: review comment on |
There was a problem hiding this comment.
One issue found.
regression-test/data/query_p0/set_operations/set_operation_exprid_reuse.outencodes the wrong expected result forintersect_nulls. Doris already treatsNULLas equal for set operations: the existingquery_p0/set_operations/set_with_nullsuite expectsINTERSECT ... SELECT NULLto return\N, not to drop it. The new expected output only contains1, so this regression test would bless a semantic regression instead of protecting against it.
Critical checkpoint conclusions:
- Goal / correctness / test proof: The goal is to fix
ExprId -> SlotRefoverwrite during set-operation translation. The code change inPhysicalPlanTranslator.visitPhysicalSetOperation()does address that overwrite hazard by translating each child immediately after visiting it, and the added suite exercises the reported failing shapes. However, one expected result is incorrect, so the current tests do not yet prove the fix safely. - Scope / minimality: The code change is small and focused. Extending the same interleaving pattern to
visitPhysicalRecursiveUnion()is also consistent with the same mutable-map mechanism. - Concurrency: No new concurrency or lock-safety concerns in the touched code path.
- Lifecycle / static initialization: No special lifecycle or static-init concerns introduced here.
- Config changes: None.
- Compatibility: No FE/BE protocol or storage-format compatibility changes detected.
- Parallel code paths: The analogous recursive CTE path was updated too, which is good. I did not find another remaining set-operation translation path with the same overwrite pattern in this file.
- Special condition checks: The new comments explain the
ExprIdreuse hazard clearly enough. - Test coverage: Coverage improved for INTERSECT / EXCEPT / UNION DISTINCT and nested shapes, but the
intersect_nullsoracle is inconsistent with existing Doris semantics and should be corrected. No recursive CTE regression was added for the newly touched parallel path. - Observability: No additional observability seems necessary for this FE translation fix.
- Transaction / persistence: Not applicable.
- Data writes / modifications: Not applicable beyond regression-test DDL/DML setup.
- FE/BE variable passing: No new variables added.
- Performance: The interleaving avoids extra retranslation hazards without adding meaningful overhead.
- Other issues: None beyond the incorrect expected result above.
| 30 Z | ||
|
|
||
| -- !intersect_nulls -- | ||
| 1 |
There was a problem hiding this comment.
INTERSECT in Doris treats NULL as equal in set semantics. The existing query_p0/set_operations/set_with_null suite already expects SELECT k1 FROM d_table INTERSECT SELECT NULL to return \N. Recording only 1 here would bless a semantic regression in the new test rather than verify the translator fix. This case should include the NULL row as well.
FE UT Coverage ReportIncrement line coverage |
What problem does this PR solve?
Issue Number: close #CIR-19889
Problem Summary:
When the Nereids optimizer produces set operations (INTERSECT/EXCEPT/UNION DISTINCT) where ExprIds are reused between:
The plan translator in
visitPhysicalSetOperation()would translate children's result expressions and distribute expressions AFTER all children had been visited. Since each child visit callsgenerateTupleDesc()which overwrites theExprId→SlotRefmapping viaHashMap.put(), later children overwrite earlier children's mappings. This caused the translated result expressions to reference wrong slots, leading to BE errors:VSlotRef have invalid slot id.The fix interleaves child visits with result/distribute expression translation — translating each child's expressions immediately after visiting that child, before the next child's visit can overwrite the
ExprId→SlotRefmapping.Reproducing query:
Release note
Fixed a bug where INTERSECT/EXCEPT queries with CTEs and expressions could fail with "VSlotRef have invalid slot id" error when the optimizer reuses ExprIds across set operation children.
Check List (For Author)