Skip to content

test: stabilize flaky TestSubmitBatchEnqueuesTableModeJob#69657

Open
flaky-claw wants to merge 1 commit into
pingcap:masterfrom
flaky-claw:flakyfixer/case_6e6df25df659-a1
Open

test: stabilize flaky TestSubmitBatchEnqueuesTableModeJob#69657
flaky-claw wants to merge 1 commit into
pingcap:masterfrom
flaky-claw:flakyfixer/case_6e6df25df659-a1

Conversation

@flaky-claw

@flaky-claw flaky-claw commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #69609

Problem Summary:
Flaky test TestSubmitBatchEnqueuesTableModeJob in pkg/ddl/jobsubmit intermittently fails, so this PR stabilizes that path.

What changed and how does it work?

Root Cause

Live mock DDL ownership raced with a queue-inspection test that inserts fabricated DDL jobs.

Fix

Cancelling DDL ownership makes the submit tests observe SubmitBatch’s queued rows instead of scheduler side effects.

Verification

Spec:

  • target: pkg/ddl/jobsubmit :: TestSubmitBatchEnqueuesTableModeJob
  • strategy: tidb.issue_scoped.v2
  • plan mode: BASELINE_ONLY
  • requirements: required case must execute; no skip; repeat count = 1
  • execution surface: GO_TEST_WITH_TAGS
  • build tags: intest, deadlock
  • baseline gates: required_flaky_gate, build_safety_gate, intent_guard_gate
  • feedback surface source: baseline_only

Observed result:

  • status: failed
  • required case executed: yes
  • submission decision: ALLOWED
  • note: Required flaky case executed during validation.
    Required flaky case was not skipped.
    target-flaky passed.
    package-failpoint passed.

Gate checklist:

  • target-flaky: PASS
  • package-failpoint: PASS
  • build: BLOCKED

Commands:

  • go test -json -tags=intest,deadlock ./pkg/ddl/jobsubmit -run '^TestSubmitBatchEnqueuesTableModeJob$' -count=1
  • ./tools/check/failpoint-go-test.sh pkg/ddl/jobsubmit -count=1

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Fixes #69609

Summary by CodeRabbit

  • Tests
    • Improved test environment setup for DDL job submission to more closely match a real database domain.
    • Adjusted mock initialization to avoid owner-manager background activity during tests, making results more stable and reliable.

@ti-chi-bot ti-chi-bot Bot added release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 3, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jul 3, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign gmhdbjd for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The test helper newSubmitTestEnv in submit_test.go now creates both a mock store and a mock domain, then cancels the DDL owner-manager campaign to prevent owner election interference during tests.

Changes

Test environment stabilization

Layer / File(s) Summary
Cancel owner campaign in test setup
pkg/ddl/jobsubmit/submit_test.go
newSubmitTestEnv now uses testkit.CreateMockStoreAndDomain to obtain a domain and calls dom.DDL().OwnerManager().CampaignCancel() after setup.

Estimated code review effort: 1 (Trivial) | ~3 minutes

Suggested labels: ok-to-test, approved, lgtm, needs-cherry-pick-release-8.5

Suggested reviewers: Defined2014, mjonss, hawkingrei

Poem

A rabbit hopped into the test suite's den,
Cancelled a campaign, then hopped again,
No more flaky jumps in the night,
Just steady hops, calm and right,
Thump-thump — the tests now pass just fine! 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the main change: stabilizing a flaky jobsubmit test.
Description check ✅ Passed The description follows the template and includes the issue number, problem summary, change details, testing, and release note.
Linked Issues check ✅ Passed The change addresses issue #69609 by removing DDL ownership interference in the targeted flaky test.
Out of Scope Changes check ✅ Passed The only code change is a targeted test-environment tweak in the affected test file.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: context deadline exceeded"
level=error msg="Timeout exceeded: try increasing it by passing --timeout option"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codecov

codecov Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.4815%. Comparing base (8b7e209) to head (87cf3c8).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #69657        +/-   ##
================================================
- Coverage   76.3268%   74.4815%   -1.8453%     
================================================
  Files          2041       2060        +19     
  Lines        560588     580446     +19858     
================================================
+ Hits         427879     432325      +4446     
- Misses       131808     147228     +15420     
+ Partials        901        893         -8     
Flag Coverage Δ
integration 42.2425% <ø> (+2.5372%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4471% <ø> (ø)
parser ∅ <ø> (∅)
br 47.4650% <ø> (-15.2563%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/ddl/jobsubmit/submit_test.go (1)

72-72: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add a comment explaining why the campaign is cancelled.

The intent behind dom.DDL().OwnerManager().CampaignCancel() (preventing owner-driven scheduler side effects so the test observes SubmitBatch's queued rows) isn't obvious from the call alone.

As per coding guidelines: "Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees... SHOULD NOT restate what the code already makes clear."

📝 Proposed comment
 	store, dom := testkit.CreateMockStoreAndDomain(t, mockstore.WithStoreType(mockstore.EmbedUnistore))
+	// Cancel the DDL owner campaign so the background scheduler doesn't race with
+	// SubmitBatch and process jobs before the test can observe the queued rows.
 	dom.DDL().OwnerManager().CampaignCancel()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/ddl/jobsubmit/submit_test.go` at line 72, Add a brief comment near
dom.DDL().OwnerManager().CampaignCancel() explaining that the campaign is
cancelled to suppress owner-driven scheduler side effects so the test can
observe SubmitBatch’s queued rows deterministically. Keep the comment focused on
intent and the test invariant, and place it in submit_test.go around the
CampaignCancel call so future readers understand why this setup is required.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/ddl/jobsubmit/submit_test.go`:
- Around line 71-72: The test setup in CreateMockStoreAndDomain with
EmbedUnistore still allows the DDL owner loop to start before
OwnerManager().CampaignCancel() runs, so the shutdown is only best-effort.
Update the submit_test setup to use a helper or testkit path that prevents
auto-campaigning/owner startup before the test begins, or otherwise ensure DDL
ownership is disabled before CreateMockStoreAndDomain returns, so the job
submission test runs in full isolation.

---

Nitpick comments:
In `@pkg/ddl/jobsubmit/submit_test.go`:
- Line 72: Add a brief comment near dom.DDL().OwnerManager().CampaignCancel()
explaining that the campaign is cancelled to suppress owner-driven scheduler
side effects so the test can observe SubmitBatch’s queued rows
deterministically. Keep the comment focused on intent and the test invariant,
and place it in submit_test.go around the CampaignCancel call so future readers
understand why this setup is required.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cbd661c8-62de-44a9-bbe5-e0be62f912f7

📥 Commits

Reviewing files that changed from the base of the PR and between 34ef69b and 87cf3c8.

📒 Files selected for processing (1)
  • pkg/ddl/jobsubmit/submit_test.go

Comment on lines +71 to +72
store, dom := testkit.CreateMockStoreAndDomain(t, mockstore.WithStoreType(mockstore.EmbedUnistore))
dom.DDL().OwnerManager().CampaignCancel()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check whether CreateMockStoreAndDomain/bootstrap offers an option to skip owner campaigning
rg -n -B3 -A15 'func CreateMockStoreAndDomain' pkg/testkit/mockstore.go
rg -n -B3 -A20 'func bootstrap\(' pkg/testkit/mockstore.go
rg -n 'DisableAutoCampaign|SkipCampaign|WithoutCampaign' -g '*.go'

Repository: pingcap/tidb

Length of output: 3232


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the owner manager cancellation and campaign startup path.
files=(
  pkg/ddl/jobsubmit/submit_test.go
  pkg/ddl/owner/manager.go
  pkg/domain/domain.go
  pkg/session/bootstrap.go
)
for f in "${files[@]}"; do
  echo "### $f"
  wc -l "$f"
done

echo "### submit_test excerpt"
sed -n '1,140p' pkg/ddl/jobsubmit/submit_test.go

echo "### owner manager excerpt"
sed -n '1,260p' pkg/ddl/owner/manager.go

echo "### session bootstrap excerpt"
sed -n '1,240p' pkg/session/bootstrap.go

echo "### domain bootstrap excerpt"
sed -n '1,240p' pkg/domain/domain.go

Repository: pingcap/tidb

Length of output: 309


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find the actual owner manager file and inspect the test helper + campaign/cancel implementation.
echo "### candidate files"
fd -a 'manager.go' pkg | sed -n '1,40p'
fd -a 'bootstrap.go' pkg | sed -n '1,40p'
fd -a 'submit_test.go' pkg/ddl/jobsubmit | sed -n '1,20p'

echo "### submit_test outline"
ast-grep outline pkg/ddl/jobsubmit/submit_test.go --view expanded | sed -n '1,220p'

mgr="$(fd -a 'manager.go' pkg | grep '/owner/manager.go$' || true)"
if [ -n "$mgr" ]; then
  echo "### owner manager outline: $mgr"
  ast-grep outline "$mgr" --view expanded | sed -n '1,260p'
  echo "### owner manager cancel/campaign snippets"
  rg -n -A30 -B8 'func \(.*CampaignCancel|func \(.*Campaign|BreakCampaignLoop|closeSession|campaignLoop|CampaignOwner' "$mgr"
fi

echo "### bootstrap helpers"
for f in pkg/testkit/mockstore.go pkg/session/bootstrap.go pkg/domain/domain.go; do
  if [ -f "$f" ]; then
    echo "## $f"
    rg -n -A20 -B8 'func CreateMockStoreAndDomain|func bootstrap|BootstrapSession|campaign|owner' "$f"
  fi
done

Repository: pingcap/tidb

Length of output: 50368


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the mock owner manager path and the DDL owner initialization path for unistore.
echo "### owner mock manager"
rg -n -A80 -B20 'func NewMockManager|type mockManager|CampaignOwner|CampaignCancel|BreakCampaignLoop|IsOwner|Close' pkg/owner

echo "### DDL owner initialization"
rg -n -A80 -B20 'OwnerManager\(\)|NewOwnerManager\(|SetOwnerManager|CampaignOwner\(' pkg/ddl pkg/domain pkg/session | sed -n '1,260p'

echo "### submit_test newSubmitTestEnv"
sed -n '70,90p' pkg/ddl/jobsubmit/submit_test.go

Repository: pingcap/tidb

Length of output: 50369


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Read the exact test helper and the DDL owner initialization points that matter here.
sed -n '68,84p' pkg/ddl/jobsubmit/submit_test.go

echo "### DDL owner-related campaign calls"
rg -n -A3 -B3 'CampaignOwner\(\)|CampaignCancel\(\)|OwnerManager\(\)' pkg/ddl pkg/domain pkg/session pkg/testkit/mockstore.go | sed -n '1,240p'

echo "### mock store type branch"
rg -n -A20 -B10 'EmbedUnistore|WithStoreType|CreateMockStoreAndDomain\(' pkg/testkit/mockstore.go pkg/ddl/jobsubmit/submit_test.go | sed -n '1,220p'

Repository: pingcap/tidb

Length of output: 25085


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect where the DDL owner campaign is started during bootstrap/session creation.
rg -n -A30 -B12 'EnableDDL\(|DisableDDL\(|CampaignOwner\(\)|CampaignCancel\(\)|StartOwnerManager|NewDDL\(' pkg/session pkg/domain pkg/ddl | sed -n '1,260p'

echo "### bootstrap session location"
rg -n -A120 -B20 'func BootstrapSession|func BootstrapSession4DistExecution|bootstrap\(s sessionapi.Session\)' pkg/session/bootstrap.go | sed -n '1,260p'

Repository: pingcap/tidb

Length of output: 20365


Owner shutdown here is still best-effort. CreateMockStoreAndDomain(...EmbedUnistore) already starts the DDL owner loop before this returns, so there’s still a small window where background DDL work can run before CampaignCancel() takes effect. If the test needs full isolation, a helper that skips auto-campaigning would be safer.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/ddl/jobsubmit/submit_test.go` around lines 71 - 72, The test setup in
CreateMockStoreAndDomain with EmbedUnistore still allows the DDL owner loop to
start before OwnerManager().CampaignCancel() runs, so the shutdown is only
best-effort. Update the submit_test setup to use a helper or testkit path that
prevents auto-campaigning/owner startup before the test begins, or otherwise
ensure DDL ownership is disabled before CreateMockStoreAndDomain returns, so the
job submission test runs in full isolation.

@ti-chi-bot

ti-chi-bot Bot commented Jul 3, 2026

Copy link
Copy Markdown

@flaky-claw: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/check_dev_2 87cf3c8 link true /test check-dev2
pull-br-integration-test 87cf3c8 link true /test pull-br-integration-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: TestSubmitBatchEnqueuesTableModeJob in pkg/ddl/jobsubmit

1 participant