Skip to content

dxf,ddl: fix dxf add index region batch restriction for region continuity#69627

Open
YangKeao wants to merge 6 commits into
pingcap:masterfrom
YangKeao:fix-dxf-add-index-region-batch
Open

dxf,ddl: fix dxf add index region batch restriction for region continuity#69627
YangKeao wants to merge 6 commits into
pingcap:masterfrom
YangKeao:fix-dxf-add-index-region-batch

Conversation

@YangKeao

@YangKeao YangKeao commented Jul 3, 2026

Copy link
Copy Markdown
Member

What problem does this PR solve?

Issue Number: close #69626
Problem Summary:

When generating DXF backfill subtasks for ADD INDEX, TiDB used region metadata to split the table key range into batches and required adjacent regions to be exactly continuous.

However, region metadata can change during region merge/split or scheduling. RegionCache.LoadRegionsInKeyRange can also load regions in multiple batches, and region layout changes between batches can make the returned region metadata non-continuous or contain duplicate start keys. Because the region information is only used for rough batching, requiring exact region continuity can unnecessarily block DDL execution.

What changed and how does it work?

This PR changes DXF backfill range planning to build subtask ranges from sorted region start keys instead of relying on region end-key continuity.

Specifically:

  • Remove the strict continuous-region check.
  • Sort region metadata by start key.
  • Deduplicate region metadata with the same start key.
  • Use the next batch's first region start key as the current batch end key.
  • Keep the original table range start and end key as the outer boundaries.
  • Return an error if no region metadata is found, instead of falling back to a single range.

This keeps generated subtask ranges continuous and non-overlapping while avoiding unnecessary DDL failures caused by transient region metadata changes.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • Bug Fixes
    • Improved backfill and merge range planning by loading region metadata more consistently.
    • Prevented duplicate/overlapping region entries by de-duplicating ranges by start key.
    • Streamlined behavior when region metadata is missing, returning errors directly instead of retrying.
  • Tests
    • Added unit coverage for de-duplicating sorted region metadata by start key.

@ti-chi-bot

ti-chi-bot Bot commented Jul 3, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign bb7133 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 3, 2026
@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR removes retry-based region scanning and continuity checks from DXF backfill planning, switches both planners to direct region-cache reads with StartKey-based deduplication, and adds a unit test for the shared deduplication helper.

Changes

Region planning simplification

Layer / File(s) Summary
Remove backoff imports and constants
pkg/ddl/backfilling_dist_scheduler.go
Removes the time and backoff imports and the scanRegionBackoffBase/scanRegionBackoffMax constants.
Deduplication helper and test
pkg/ddl/backfilling_dist_scheduler.go, pkg/ddl/backfilling_dist_scheduler_internal_test.go
Adds deduplicateSortedRegionMetasByStartKey and its StartKey constraint, plus a test file with regionMetaForTest and TestDeduplicateSortedRegionMetasByStartKey.
Direct region loading in generatePlanForPhysicalTable
pkg/ddl/backfilling_dist_scheduler.go
Replaces backoff retry and continuity checks with direct region loading, sorting, deduplication, and batch-derived RowStart/RowEnd computation.
Direct region loading in genMergeTempPlanForOneIndex
pkg/ddl/backfilling_dist_scheduler.go
Replaces backoff retry and continuity checks with direct region loading, sorting, deduplication, and batch-derived SortedKVMeta range computation.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Suggested reviewers: D3Hunter, wjhuang2016

Poem

A rabbit hopped through region maps so wide,
No backoff loops or strictness left to hide.
Sort, dedupe, then hop along the way,
Subtasks split more calmly day by day.
🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR removes strict continuity checks, deduplicates region boundaries, and keeps ranges covering the table, matching #69626.
Out of Scope Changes check ✅ Passed The changes stay focused on DXF ADD INDEX range planning and its test coverage.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title clearly matches the PR’s main change: relaxing DXF ADD INDEX region batching around continuity checks.
Description check ✅ Passed The description matches the template and includes the issue number, problem summary, changes, tests, side effects, and release note.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
pkg/ddl/backfilling_dist_scheduler_internal_test.go (1)

23-47: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Consider covering the early-return edge cases.

The helper's len(regionMetas) < 2 early-return branch (empty/single-element slices) and the no-duplicates case aren't exercised here; only the consecutive-duplicates path is tested.

♻️ Optional additional cases
 func TestDeduplicateSortedRegionMetasByStartKey(t *testing.T) {
+	require.Empty(t, deduplicateSortedRegionMetasByStartKey([]regionMetaForTest{}))
+	require.Equal(t, []regionMetaForTest{{startKey: []byte("a")}},
+		deduplicateSortedRegionMetasByStartKey([]regionMetaForTest{{startKey: []byte("a")}}))
+
 	regions := []regionMetaForTest{
 		{startKey: []byte("a")},
 		{startKey: []byte("a")},
 		{startKey: []byte("b")},
 		{startKey: []byte("b")},
 		{startKey: []byte("d")},
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/ddl/backfilling_dist_scheduler_internal_test.go` around lines 23 - 47,
The current test for deduplicateSortedRegionMetasByStartKey only covers
consecutive duplicates, so add cases for the len(regionMetas) < 2 early return
and for an already-unique sorted slice. Extend
TestDeduplicateSortedRegionMetasByStartKey in
backfilling_dist_scheduler_internal_test.go to verify empty and single-element
inputs are returned unchanged, and add a no-duplicates case that confirms the
slice is preserved when no deduplication is needed.
pkg/ddl/backfilling_dist_scheduler.go (2)

336-386: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚖️ Poor tradeoff

Optional: extract the shared load/sort/dedup/batch-boundary logic.

generatePlanForPhysicalTable and genMergeTempPlanForOneIndex now duplicate the same sequence (load regions → empty check → sort by StartKey → dedup → compute regionBatch → derive per-batch [start,end) boundaries). Only the terminal BackfillSubTaskMeta construction differs. A helper returning the batch boundary pairs (e.g. []struct{ start, end []byte }) would remove the duplication while keeping each caller's meta assembly. Deferrable.

Also applies to: 875-921

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/ddl/backfilling_dist_scheduler.go` around lines 336 - 386, The region
loading, sorting, deduping, batch calculation, and boundary derivation logic is
duplicated between generatePlanForPhysicalTable and genMergeTempPlanForOneIndex.
Extract that shared flow into a helper that returns per-batch start/end
boundaries (or equivalent batch descriptors), then have both callers reuse it
and keep only their BackfillSubTaskMeta-specific assembly separate.

336-340: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Thread ctx into the backoffer at both call sites.
tikv.NewBackofferWithVars(context.Background(), 20000, nil) detaches LoadRegionsInKeyRange from the DDL job’s cancellation path, so a cancelled job can still keep backing off/retrying. Use ctx here and at the matching index path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/ddl/backfilling_dist_scheduler.go` around lines 336 - 340, Thread the DDL
context into the backoffer instead of using context.Background() in
BackfillingDistScheduler’s region lookup path. Update the LoadRegionsInKeyRange
call in the backfilling scheduler to build the backoffer with ctx, and apply the
same change at the matching index-path call site so both respect job
cancellation and deadlines.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/ddl/backfilling_dist_scheduler_internal_test.go`:
- Around line 23-47: The current test for deduplicateSortedRegionMetasByStartKey
only covers consecutive duplicates, so add cases for the len(regionMetas) < 2
early return and for an already-unique sorted slice. Extend
TestDeduplicateSortedRegionMetasByStartKey in
backfilling_dist_scheduler_internal_test.go to verify empty and single-element
inputs are returned unchanged, and add a no-duplicates case that confirms the
slice is preserved when no deduplication is needed.

In `@pkg/ddl/backfilling_dist_scheduler.go`:
- Around line 336-386: The region loading, sorting, deduping, batch calculation,
and boundary derivation logic is duplicated between generatePlanForPhysicalTable
and genMergeTempPlanForOneIndex. Extract that shared flow into a helper that
returns per-batch start/end boundaries (or equivalent batch descriptors), then
have both callers reuse it and keep only their BackfillSubTaskMeta-specific
assembly separate.
- Around line 336-340: Thread the DDL context into the backoffer instead of
using context.Background() in BackfillingDistScheduler’s region lookup path.
Update the LoadRegionsInKeyRange call in the backfilling scheduler to build the
backoffer with ctx, and apply the same change at the matching index-path call
site so both respect job cancellation and deadlines.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d05df9ff-8d40-42ea-87ed-efedf644e2de

📥 Commits

Reviewing files that changed from the base of the PR and between b212b93 and 2d65351.

📒 Files selected for processing (2)
  • pkg/ddl/backfilling_dist_scheduler.go
  • pkg/ddl/backfilling_dist_scheduler_internal_test.go

@codecov

codecov Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 97 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.9194%. Comparing base (b212b93) to head (5e6e7c1).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #69627        +/-   ##
================================================
- Coverage   76.3234%   75.9194%   -0.4041%     
================================================
  Files          2041       2077        +36     
  Lines        560728     578711     +17983     
================================================
+ Hits         427967     439354     +11387     
- Misses       131860     137374      +5514     
- Partials        901       1983      +1082     
Flag Coverage Δ
integration 45.6748% <0.0000%> (+5.9696%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4471% <ø> (ø)
parser ∅ <ø> (∅)
br 63.4169% <ø> (+0.6955%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@YangKeao YangKeao changed the title Fix dxf add index region batch Fix dxf add index region batch restriction for region continuity Jul 3, 2026
@YangKeao YangKeao changed the title Fix dxf add index region batch restriction for region continuity dxf,ddlix dxf add index region batch restriction for region continuity Jul 3, 2026
@YangKeao YangKeao changed the title dxf,ddlix dxf add index region batch restriction for region continuity dxf,ddl: fix dxf add index region batch restriction for region continuity Jul 3, 2026
@D3Hunter

D3Hunter commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

i think the right way is to find out why region boundary is non continues and make sure the region cache be synced with pd to avoid such issue, instead of tolerating such NON continue region bound, this can potentially be a fatal correctness issue

@D3Hunter

D3Hunter commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

/hold

@ti-chi-bot ti-chi-bot Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The restriction for continuity of DXF add index is too hard

2 participants