[fix](regression) Wait for stable colocate groups in plan tests#64361
[fix](regression) Wait for stable colocate groups in plan tests#64361morrySnow wants to merge 1 commit into
Conversation
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Colocate join plan assertions could run while the colocate group was still unstable, causing EXPLAIN to omit COLOCATE intermittently. Add a shared regression-suite helper that waits for the target group to become stable and use it before positive colocate plan assertions.
### Release note
None
### Check List (For Author)
- Test: Regression test
- test_colocate_join_of_column_order
- colocate_join_with_rollup
- Behavior changed: No
- Does this need documentation: No
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
There was a problem hiding this comment.
I found two issues to address before merge.
Critical checkpoint conclusions:
- Goal/test proof: The helper and most call sites address unstable colocate groups, but
test_colocate_join_of_column_orderstill lets the negative plan assertions pass while the group may be unstable. - Scope/focus: The change is small and focused.
- Concurrency/lifecycle/config/compatibility/transaction/persistence/data writes/observability: Not applicable; this PR only changes regression Groovy helpers/tests.
- Parallel paths: Positive
COLOCATEassertions in the touched suites are covered; the first negative column-order path should use the same stable precondition. - Test coverage/results: No
.outfiles changed. I did not run regression tests in this review; I inspected the diff and rangit diff --check. - User focus: No additional user-provided review focus was supplied.
| } | ||
|
|
||
| void waitForColocateGroupStable(String dbName, String groupName, int timeoutSeconds = 60) { | ||
| String fullGroupName = "${dbName}.${groupName}" |
There was a problem hiding this comment.
This shared helper does not handle Doris global colocate groups. colocate_with accepts names starting with __global__, and SHOW PROC '/colocation_group' reports those as the bare global name rather than db.__global__... (see ColocateTableIndex.GroupId.getFullGroupName). With the current unconditional prefixing, waitForColocateGroupStable("__global__...") will time out even when the group is stable. Please either leave global group names unprefixed here or make this helper explicitly db-scoped.
| sql("select * from test_colocate_join_of_column_order_t1 a join test_colocate_join_of_column_order_t2 b on a.k1=b.k2 and a.v=b.v;") | ||
| notContains "COLOCATE" | ||
| } | ||
| waitForColocateGroupStable("group_column_order") |
There was a problem hiding this comment.
This wait is after the four negative notContains "COLOCATE" assertions above, so those checks can still pass for the same unstable-group reason this PR is fixing. When the group is unstable, Nereids skips colocate join regardless of the join condition, which means these negative cases do not actually verify the column-order logic. Please move this wait to before the first explain after the inserts/planner setup so both the negative and positive assertions run with a stable colocate group.
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary:
Colocate join plan assertions may run before
ColocateTableCheckerAndBalancermarks the colocate group stable. In that state, Nereids can omitCOLOCATEfrom the plan and make otherwise correct regression cases fail intermittently.This PR adds a shared
waitForColocateGroupStablehelper to the regression test framework. It pollsSHOW PROC '/colocation_group'until the target group'sIsStablevalue is true, and fails on timeout. All positiveCOLOCATEplan assertions backed by explicitcolocate_withgroups now wait for stability first. The existing backup/restore-local waiting closure is replaced with the shared helper.Release note
None
Check List (For Author)
./run-regression-test.sh --clean --run -d correctness_p0 -s test_colocate_join_of_column_order./run-regression-test.sh --run -d query_p0/join -s colocate_join_with_rollup