Skip to content

dxf, server: tune DXF task scheduling controls#69597

Open
D3Hunter wants to merge 10 commits into
pingcap:release-nextgen-202603from
D3Hunter:dxf-rand-schedule-node-clean-max-task
Open

dxf, server: tune DXF task scheduling controls#69597
D3Hunter wants to merge 10 commits into
pingcap:release-nextgen-202603from
D3Hunter:dxf-rand-schedule-node-clean-max-task

Conversation

@D3Hunter

@D3Hunter D3Hunter commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: ref #61702

Problem Summary:

The nextgen DXF service needs runtime controls for the shared task scheduler, and the scheduler should avoid repeatedly concentrating max-node-limited tasks on the first eligible nodes. Finished-task cleanup also needs to avoid loading an unbounded number of tasks in one cleanup tick.

What changed and how does it work?

This PR:

  • makes the DXF max concurrent task limit configurable through an in-memory atomic setting;
  • adds an HTTP handler for reading and updating the DXF max concurrent task limit;
  • raises the allowed max concurrent task upper bound to support larger nextgen DXF pools;
  • randomly selects up to MaxNodeCount eligible nodes before generating subtasks for the next scheduler step;
  • limits finished-task cleanup queries to one 100-task batch;
  • adds scheduler, storage, proto, and HTTP handler tests for the new behavior.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Manual test:

  • git diff --check upstream/release-nextgen-202603...HEAD

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • New Features

    • Added runtime API to view and update the DXF max-concurrent-task limit (GET/POST /dxf/task/max_concurrent).
  • Bug Fixes

    • Scheduler now randomly selects eligible nodes when MaxNodeCount applies (instead of deterministic truncation).
    • Enforced concurrency-aware SQL limits for task cleanup and task lookup results.
    • Updated task-history transfer to use bulk SQL statements for faster, batched updates.
  • Tests

    • Added/updated unit and integration tests covering max-concurrent-task configuration, boundary validation, and random node selection.

@ti-chi-bot

ti-chi-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 2, 2026
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a runtime-configurable DXF max-concurrent-task setting, updates scheduler/storage/benchmark paths to read it dynamically, exposes it over HTTP, rewrites task-history transfer to batch SQL, and refactors import cleanup into batch processing with parallel metering.

Changes

DXF concurrency and task-history flow

Layer / File(s) Summary
Atomic max concurrent task contract and accessors
pkg/dxf/framework/proto/task.go, pkg/dxf/framework/proto/task_test.go, pkg/dxf/framework/proto/BUILD.bazel
Replaces the fixed concurrency limit with atomic-backed accessors and validates the new range behavior in tests.
Scheduler and storage wiring to dynamic limit
pkg/dxf/framework/scheduler/interface.go, pkg/dxf/framework/storage/task_table.go, pkg/dxf/framework/integrationtests/bench_test.go, pkg/dxf/framework/scheduler/scheduler_test.go, pkg/dxf/framework/storage/table_test.go, tests/realtikvtest/addindextest1/disttask_test.go
Updates task-limit calculations, benchmark setup, and SQL query limits to read the current concurrency setting dynamically.
Random node selection in scheduler
pkg/dxf/framework/scheduler/scheduler.go, pkg/dxf/framework/scheduler/scheduler_nokit_test.go
Changes capped eligible-node selection to shuffle before truncating and verifies the resulting dispatch order.
HTTP API for max concurrent task
pkg/server/handler/tikvhandler/dxf.go, pkg/server/http_status.go, pkg/server/handler/tests/dxf_test.go
Adds a GET/POST handler, registers the route, and exercises the endpoint with valid and invalid requests.
Batch task history transfer
pkg/dxf/framework/storage/history.go, pkg/dxf/framework/storage/table_test.go, pkg/dxf/framework/storage/BUILD.bazel
Rewrites task history transfer to bulk SQL and adds SQL-recording tests for the new batch behavior and query limits.

Import cleanup batching and metering

Layer / File(s) Summary
Batch cleanup and metering
pkg/dxf/importinto/clean_up.go
Changes cleanup to operate on batches, groups external file cleanup by cloud storage URI, resets classic table mode, and sends metering in parallel with bounded concurrency.
Cleanup tests and wiring
pkg/dxf/importinto/clean_up_test.go, pkg/dxf/importinto/BUILD.bazel, .agents/skills/tidb-test-guidelines/references/dxf-case-map.md
Adds concurrency and panic tests, plus Bazel and test-map updates for the new import cleanup test file.

Estimated code review effort: 4 (Complex) | ~60 minutes

Suggested labels: approved, lgtm

Suggested reviewers: joechenrh, wjhuang2016, GMHDBJD

Poem

I’m a rabbit, bright and spry,
Hopping through the bits nearby.
Set the limit, batch the hops,
Shuffle tasks and never stop.
🐰

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and matches the main DXF scheduling-control changes in the PR.
Description check ✅ Passed The description follows the template and covers the issue, changes, tests, and release note, though the problem summary is left blank.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@D3Hunter D3Hunter marked this pull request as ready for review July 2, 2026 09:22
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 2, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/dxf/framework/scheduler/scheduler_manager_nokit_test.go (1)

131-143: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Test doesn't exercise the real batch-limit enforcement.

This scenario mocks GetTasksInStates to already return a capped slice (manyTasks[:maxCleanupTaskBatchSize]), so it only verifies that doCleanupTask forwards whatever TaskManager returns — it does not validate that the real SQL limit 100 in storage/task_table.go is actually what produces that batch. Combined with maxCleanupTaskBatchSize not being referenced by production code (see comment on scheduler_manager.go), this test can pass even if the real storage-layer limit diverges from 100.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dxf/framework/scheduler/scheduler_manager_nokit_test.go` around lines 131
- 143, The cleanup batching test is only asserting the mocked return value from
GetTasksInStates, so it does not validate the real batch-limit behavior. Update
doCleanupTask-related testing in scheduler_manager_nokit_test.go to use an
over-limit task set and verify that the storage-facing path, not the mock slice,
enforces the limit; reference doCleanupTask, GetTasksInStates, and
TransferTasks2History when adjusting the expectations. If the production limit
is defined in storage/task_table.go, align the test with that source of truth
instead of maxCleanupTaskBatchSize so the test fails when the real batch size
diverges.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/dxf/framework/scheduler/scheduler_manager.go`:
- Around line 54-55: The cleanup batch size is duplicated between
maxCleanupTaskBatchSize and the hardcoded limit in task_table.go, so there are
two independent sources of truth. Move the value into a shared constant that can
be referenced by both scheduler_manager_nokit_test.go and the cleanup logic in
task_table.go, then replace both literals with that shared symbol to keep the
limit consistent.

---

Nitpick comments:
In `@pkg/dxf/framework/scheduler/scheduler_manager_nokit_test.go`:
- Around line 131-143: The cleanup batching test is only asserting the mocked
return value from GetTasksInStates, so it does not validate the real batch-limit
behavior. Update doCleanupTask-related testing in
scheduler_manager_nokit_test.go to use an over-limit task set and verify that
the storage-facing path, not the mock slice, enforces the limit; reference
doCleanupTask, GetTasksInStates, and TransferTasks2History when adjusting the
expectations. If the production limit is defined in storage/task_table.go, align
the test with that source of truth instead of maxCleanupTaskBatchSize so the
test fails when the real batch size diverges.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 29dee4af-30bc-45cd-bdad-62a5a9c1c9e4

📥 Commits

Reviewing files that changed from the base of the PR and between dd1be2d and 5468ebd.

📒 Files selected for processing (17)
  • pkg/dxf/framework/integrationtests/bench_test.go
  • pkg/dxf/framework/proto/BUILD.bazel
  • pkg/dxf/framework/proto/task.go
  • pkg/dxf/framework/proto/task_test.go
  • pkg/dxf/framework/scheduler/BUILD.bazel
  • pkg/dxf/framework/scheduler/interface.go
  • pkg/dxf/framework/scheduler/scheduler.go
  • pkg/dxf/framework/scheduler/scheduler_manager.go
  • pkg/dxf/framework/scheduler/scheduler_manager_nokit_test.go
  • pkg/dxf/framework/scheduler/scheduler_nokit_test.go
  • pkg/dxf/framework/scheduler/scheduler_test.go
  • pkg/dxf/framework/storage/table_test.go
  • pkg/dxf/framework/storage/task_table.go
  • pkg/server/handler/tests/dxf_test.go
  • pkg/server/handler/tikvhandler/dxf.go
  • pkg/server/http_status.go
  • tests/realtikvtest/addindextest1/disttask_test.go

Comment on lines +54 to +55
const maxCleanupTaskBatchSize = 100

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify maxCleanupTaskBatchSize is not used in production code, and find the real enforcement point.
rg -nP '\bmaxCleanupTaskBatchSize\b' --type=go
rg -nP 'limit\s+100' --type=go pkg/dxf/framework/storage

Repository: pingcap/tidb

Length of output: 150


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate relevant files and inspect the surrounding code that defines and uses the cleanup batch size.
git ls-files 'pkg/dxf/framework/**' | rg 'scheduler_manager\.go|task_table\.go|scheduler_manager_nokit_test\.go|proto|interface\.go'

echo '--- scheduler_manager.go ---'
sed -n '1,260p' pkg/dxf/framework/scheduler/scheduler_manager.go

echo '--- task_table.go ---'
sed -n '1,260p' pkg/dxf/framework/storage/task_table.go

echo '--- scheduler_manager_nokit_test.go ---'
sed -n '1,220p' pkg/dxf/framework/scheduler/scheduler_manager_nokit_test.go

Repository: pingcap/tidb

Length of output: 24775


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find every reference to the cleanup batch-size constant and the SQL limit in the framework.
rg -n 'maxCleanupTaskBatchSize|limit 100|GetTasksInStates|cleanupFinishedTasks|doCleanupTask' pkg/dxf/framework

Repository: pingcap/tidb

Length of output: 5745


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect any surrounding declarations if the batch size constant lives in a nearby package.
rg -n 'Max.*Cleanup|Cleanup.*Batch|BatchSize|GetMaxConcurrentTask|MaxConcurrentTask' pkg/dxf/framework

Repository: pingcap/tidb

Length of output: 5420


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '425,470p' pkg/dxf/framework/scheduler/scheduler_manager.go
sed -n '434,452p' pkg/dxf/framework/storage/task_table.go

Repository: pingcap/tidb

Length of output: 2263


Deduplicate the cleanup batch size constant.

maxCleanupTaskBatchSize is only used by scheduler_manager_nokit_test.go; the actual cleanup limit is still the hardcoded limit 100 in pkg/dxf/framework/storage/task_table.go. That leaves two independent copies of the same value. Move the batch size to a shared constant and use it in both places.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dxf/framework/scheduler/scheduler_manager.go` around lines 54 - 55, The
cleanup batch size is duplicated between maxCleanupTaskBatchSize and the
hardcoded limit in task_table.go, so there are two independent sources of truth.
Move the value into a shared constant that can be referenced by both
scheduler_manager_nokit_test.go and the cleanup logic in task_table.go, then
replace both literals with that shared symbol to keep the limit consistent.

@ti-chi-bot

ti-chi-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign gmhdbjd for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 2, 2026
@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 54.74138% with 105 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release-nextgen-202603@dd1be2d). Learn more about missing BASE report.

⚠️ Current head b2a173a differs from pull request most recent head 7f617ad

Please upload reports for the commit 7f617ad to get more accurate results.

Additional details and impacted files
@@                     Coverage Diff                     @@
##             release-nextgen-202603     #69597   +/-   ##
===========================================================
  Coverage                          ?   76.1399%           
===========================================================
  Files                             ?       1935           
  Lines                             ?     540255           
  Branches                          ?          0           
===========================================================
  Hits                              ?     411350           
  Misses                            ?     128905           
  Partials                          ?          0           
Flag Coverage Δ
unit 76.1399% <54.7413%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 61.5065% <0.0000%> (?)
parser ∅ <0.0000%> (?)
br 48.7548% <0.0000%> (?)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/dxf/framework/storage/task_table.go (1)

443-449: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Document the *3 limit multiplier rationale.

The existing GetMaxConcurrentTask()*2 limit for top unfinished tasks is documented in interface.go ("returns unfinished tasks, limited by GetMaxConcurrentTask()*2, to make sure low ranking tasks can be scheduled if resource is enough"). The new *3 multiplier here has no such explanation, making it unclear why this specific factor was chosen versus *2 or a fixed value.

📝 Suggested comment
+	// limit results to GetMaxConcurrentTask()*3 to bound query cost while still
+	// covering enough tasks across states in a single tick.
 	args := make([]any, 0, len(states)+1)
 	args = append(args, states...)
 	args = append(args, proto.GetMaxConcurrentTask()*3)

As per coding guidelines, **/*.go code "Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dxf/framework/storage/task_table.go` around lines 443 - 449, Document the
non-obvious intent behind the `proto.GetMaxConcurrentTask()*3` limit in the task
query so it’s clear why this multiplier differs from the
`GetMaxConcurrentTask()*2` policy described in `interface.go`. Add a concise
comment near the `ExecuteSQLWithNewSession` call in `task_table.go` explaining
the scheduling/performance reason for choosing `*3`, and reference the relevant
task-fetching logic such as `TaskColumns`, `states`, and the unfinished-task
selection query so future readers can connect the rationale to this code path.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/dxf/framework/storage/task_table.go`:
- Around line 443-449: Document the non-obvious intent behind the
`proto.GetMaxConcurrentTask()*3` limit in the task query so it’s clear why this
multiplier differs from the `GetMaxConcurrentTask()*2` policy described in
`interface.go`. Add a concise comment near the `ExecuteSQLWithNewSession` call
in `task_table.go` explaining the scheduling/performance reason for choosing
`*3`, and reference the relevant task-fetching logic such as `TaskColumns`,
`states`, and the unfinished-task selection query so future readers can connect
the rationale to this code path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 90d0fb2b-a073-404c-8469-9e4e008662e4

📥 Commits

Reviewing files that changed from the base of the PR and between 9896721 and 0332665.

📒 Files selected for processing (2)
  • pkg/dxf/framework/storage/table_test.go
  • pkg/dxf/framework/storage/task_table.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/dxf/framework/storage/table_test.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/dxf/importinto/clean_up.go`:
- Around line 138-156: `sendMeterOnCleanUpInParallel` is incorrectly using
`needFileCleanUp` to decide whether to emit cleanup metering, which ties
billing/reporting to file deletion eligibility. Update the loop in
`sendMeterOnCleanUpInParallel` so `sendFn` runs for succeeded tasks regardless
of `needFileCleanUp`, and keep file cleanup gating separate from metering logic.
Use the existing `cleanUpTaskInfo`, `task.State`, and `sendMeterOnCleanUpFunc`
flow to locate and adjust the condition.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4f3f7f4e-3ce8-4877-802a-b22ac0e9047b

📥 Commits

Reviewing files that changed from the base of the PR and between b2a173a and 8c9a857.

📒 Files selected for processing (4)
  • .agents/skills/tidb-test-guidelines/references/dxf-case-map.md
  • pkg/dxf/importinto/BUILD.bazel
  • pkg/dxf/importinto/clean_up.go
  • pkg/dxf/importinto/clean_up_test.go
✅ Files skipped from review due to trivial changes (1)
  • .agents/skills/tidb-test-guidelines/references/dxf-case-map.md

Comment on lines +138 to +156
func sendMeterOnCleanUpInParallel(ctx context.Context, cleanUpTasks []cleanUpTaskInfo, sendFn sendMeterOnCleanUpFunc) error {
eg, egCtx := util.NewErrorGroupWithRecoverWithCtx(ctx)
eg.SetLimit(cleanUpMeteringConcurrency)
for _, cleanUpTask := range cleanUpTasks {
cleanUpTask := cleanUpTask
if !cleanUpTask.needFileCleanUp || cleanUpTask.task.State != proto.TaskStateSucceed {
continue
}
eg.Go(func() error {
logger := logutil.BgLogger().With(zap.Int64("task-id", cleanUpTask.task.ID))
if err := sendFn(egCtx, cleanUpTask.task, logger); err != nil {
logger.Warn("failed to send metering data on cleanup", zap.Error(err))
return err
}
return nil
})
}
return eg.Wait()
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check whether nextgen import plans can have an empty CloudStorageURI while still succeeding.
rg -n -B3 -A10 'CloudStorageURI' pkg/dxf/importinto/planner.go

Repository: pingcap/tidb

Length of output: 2708


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the cleanup flow and how needFileCleanUp is derived.
grep -nE 'needFileCleanUp|sendMeterOnCleanUp|CloudStorageURI|TaskStateSucceed' pkg/dxf/importinto/clean_up.go
printf '\n---\n'
sed -n '1,260p' pkg/dxf/importinto/clean_up.go

printf '\n=== planner references ===\n'
# Show the planner context around CloudStorageURI use to see whether it is optional or required.
sed -n '1,130p' pkg/dxf/importinto/planner.go
printf '\n---\n'
sed -n '390,490p' pkg/dxf/importinto/planner.go
printf '\n---\n'
sed -n '710,770p' pkg/dxf/importinto/planner.go

Repository: pingcap/tidb

Length of output: 17571


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find where CloudStorageURI is validated or defaulted, and whether nextgen imports require it.
rg -n 'CloudStorageURI|IsNextGen|GlobalSort|GetSortStore|manual.*cloud|cloud storage' pkg/dxf/importinto pkg/executor/importer

printf '\n=== task/meta definitions ===\n'
rg -n 'type TaskMeta|type PostProcessStepMeta|type Plan struct|CloudStorageURI string' pkg/dxf/importinto pkg/executor/importer

printf '\n=== focused file slices ===\n'
sed -n '1,220p' pkg/executor/importer/plan.go 2>/dev/null || true
sed -n '1,260p' pkg/executor/importer/*.go 2>/dev/null | sed -n '1,260p' || true

Repository: pingcap/tidb

Length of output: 19134


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect how Plan.CloudStorageURI affects nextgen/local-sort behavior and whether an empty value is valid.
sed -n '700,780p' pkg/executor/importer/import.go
printf '\n---\n'
sed -n '930,1010p' pkg/executor/importer/import.go
printf '\n---\n'
sed -n '1828,1855p' pkg/executor/importer/import.go
printf '\n---\n'
sed -n '180,230p' pkg/dxf/importinto/scheduler.go
printf '\n---\n'
sed -n '1,220p' pkg/executor/importer/precheck.go

Repository: pingcap/tidb

Length of output: 13151


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show the scheduler logic around GlobalSort derivation and nextgen cleanup scheduling.
sed -n '160,240p' pkg/dxf/importinto/scheduler.go
printf '\n---\n'
sed -n '280,360p' pkg/dxf/importinto/scheduler.go
printf '\n---\n'
sed -n '600,650p' pkg/dxf/importinto/scheduler.go

Repository: pingcap/tidb

Length of output: 7357


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect import plan construction for when CloudStorageURI is empty vs defaulted, especially in nextgen.
sed -n '720,770p' pkg/executor/importer/import.go
printf '\n---\n'
sed -n '940,970p' pkg/executor/importer/import.go
printf '\n---\n'
sed -n '1240,1295p' pkg/executor/importer/import.go
printf '\n---\n'
sed -n '1,120p' pkg/executor/importer/precheck.go

Repository: pingcap/tidb

Length of output: 8183


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the plan lifecycle and precheck rules to determine whether empty CloudStorageURI is allowed for nextgen imports.
sed -n '720,770p' pkg/executor/importer/import.go
printf '\n---\n'
sed -n '940,970p' pkg/executor/importer/import.go
printf '\n---\n'
sed -n '1248,1288p' pkg/executor/importer/import.go
printf '\n---\n'
sed -n '1,190p' pkg/executor/importer/precheck.go

Repository: pingcap/tidb

Length of output: 9242


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# See what metric data SendRowAndSizeMeterData represents and where else it is sent.
rg -n 'SendRowAndSizeMeterData\(' pkg | sed -n '1,120p'
printf '\n---\n'
sed -n '1,220p' pkg/dxf/framework/handle/*.go 2>/dev/null || true

Repository: pingcap/tidb

Length of output: 7343


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the tests around CloudStorageURI to see whether nextgen plans can be empty or are always populated.
sed -n '150,230p' pkg/executor/importer/import_test.go
printf '\n---\n'
sed -n '620,650p' pkg/executor/importer/import_test.go
printf '\n---\n'
sed -n '360,390p' pkg/dxf/importinto/planner_test.go

Repository: pingcap/tidb

Length of output: 5687


Decouple metering from file-cleanup eligibility. needFileCleanUp only tracks cloud-storage deletion, but it also gates sendMeterOnCleanUp, so next-gen jobs that run with an empty CloudStorageURI (local sort) will skip row/size reporting and undercount billing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dxf/importinto/clean_up.go` around lines 138 - 156,
`sendMeterOnCleanUpInParallel` is incorrectly using `needFileCleanUp` to decide
whether to emit cleanup metering, which ties billing/reporting to file deletion
eligibility. Update the loop in `sendMeterOnCleanUpInParallel` so `sendFn` runs
for succeeded tasks regardless of `needFileCleanUp`, and keep file cleanup
gating separate from metering logic. Use the existing `cleanUpTaskInfo`,
`task.State`, and `sendMeterOnCleanUpFunc` flow to locate and adjust the
condition.

@ti-chi-bot

ti-chi-bot Bot commented Jul 3, 2026

Copy link
Copy Markdown

@D3Hunter: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-build-next-gen 7f617ad link true /test pull-build-next-gen
pull-unit-test-next-gen 7f617ad link true /test pull-unit-test-next-gen
pull-integration-realcluster-test-next-gen 7f617ad link true /test pull-integration-realcluster-test-next-gen

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant