Skip to content

kv: harden test transaction 2pc core#241

Open
lance6716 wants to merge 1 commit into
tikv:masterfrom
lance6716:lance/txn-core-fixes
Open

kv: harden test transaction 2pc core#241
lance6716 wants to merge 1 commit into
tikv:masterfrom
lance6716:lance/txn-core-fixes

Conversation

@lance6716
Copy link
Copy Markdown

@lance6716 lance6716 commented May 29, 2026

Summary

  • add 2PC cleanup with KvBatchRollback for failed transactions before the primary is known committed/undetermined
  • mark primary commit RPC failures as result-undetermined and treat secondary commit errors after primary commit as success-like
  • retry CommitTsExpired with a fresh commit ts instead of changing commit ts on generic region retries
  • compute transaction size from key/value bytes, handle empty transactions, and use byte size for lock TTL / TTLManager threshold
  • add/adjust real TiKV regression coverage for rollback conflict, empty txn, commit-ts-expired retry, cleanup after failed prewrite, and large-value TTL

Tests

  • docker run --rm -v "$PWD":/client-c -w /client-c local/tikv-client-c-deps:ubuntu22 bash -lc 'cmake --build build-docker-t118 -j$(nproc)'
  • tiup playground nightly --mode tikv-slim --tag client-c-t118 --without-monitor (TiKV v9.0.0-beta.2.pre-nightly)
  • docker run --rm --network host -v "$PWD":/client-c -w /client-c local/tikv-client-c-deps:ubuntu22 bash -lc './build-docker-t118/src/test/real_tikv_test/2pc_test --gtest_color=no --gtest_filter="TestWith2PCRealTiKV.testCommitRollback:TestWith2PCRealTiKV.testEmptyTxnCommit:TestWith2PCRealTiKV.testCommitTsExpiredRetries:TestWith2PCRealTiKV.testFailedPrewriteCleansWrittenLocks:TestWith2PCRealTiKV.testLargeTxnTTLUsesBytes:TestWith2PCRealTiKV.commitAfterReadByOtherTxn"'

Known local-env issue:

  • docker run --rm -v "$PWD":/client-c -w /client-c local/tikv-client-c-deps:ubuntu22 bash -lc './build-docker-t118/src/test/kv_client_ut --gtest_color=no --gtest_filter="TestWithLockResolve.testResolveLockGet"' fails in SetUp with Connection refused from the mock cluster, same as the broader mock test environment in this container.

Summary by CodeRabbit

  • Bug Fixes

    • Improved transaction cleanup on commit failures with better error handling and fallback behavior
    • Fixed transaction TTL calculation to prevent truncation in large transactions
    • Enhanced rollback cleanup for failed prewrite operations
  • Tests

    • Added integration tests for empty transaction commits
    • Added tests for commit-timestamp expiration and retry scenarios
    • Added lock cleanup verification tests

Review Change Stack

Copilot AI review requested due to automatic review settings May 29, 2026 07:32
@ti-chi-bot ti-chi-bot Bot added the dco-signoff: yes Indicates the PR's author has signed the dco. label May 29, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 29, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign andremouche for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Warning

Review limit reached

@lance6716, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 23 minutes and 20 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2141d651-9cbe-4910-86bf-5457d05a55ca

📥 Commits

Reviewing files that changed from the base of the PR and between 3a26c8b and 46bf3d2.

📒 Files selected for processing (3)
  • include/pingcap/kv/2pc.h
  • src/kv/2pc.cc
  • src/test/real_tikv_test/2pc_test.cc
📝 Walkthrough

Walkthrough

TwoPhaseCommitter extends 2PC with a cleanup mechanism for written keys, improves lock TTL computation from transaction size, tracks commit error states, and recovers from commit-timestamp expiration by updating from PD and retrying recursively. Integration tests validate empty transactions, expiration retry, cleanup on prewrite failure, and TTL sizing.

Changes

Two-Phase Commit Cleanup and Error Recovery

Layer / File(s) Summary
Lock TTL and transaction size computation
include/pingcap/kv/2pc.h, src/kv/2pc.cc
txnLockTTL uses floating-point division to compute lock TTL accurately from transaction size in MiB. Constructor accumulates txn_size during mutation collection, computes primary_lock only when keys exist, and derives lock_ttl via txnLockTTL instead of defaulting to defaultLockTTL. Early return when transaction has no keys.
Cleanup API and action routing
include/pingcap/kv/2pc.h, src/kv/2pc.cc
Public cleanupKeys method dispatches cleanup via ActionCleanUp into the batch-action handler. Batch action dispatch now handles ActionCleanUp by calling cleanupSingleBatch, which issues KvBatchRollback for each batch with region-miss backoff and fallback to full cleanup.
Error handling and commit-timestamp expiration recovery
include/pingcap/kv/2pc.h, src/kv/2pc.cc
commit_result_undetermined boolean tracks whether commit state is recoverable. execute() closes TTL manager on exception and conditionally cleans up only when not undetermined. commitSingleBatch sets commit_result_undetermined on primary-batch errors (except RegionEpochNotMatch), handles commit_ts_expired by validating min_commit_ts distance, updates commit_ts from PD, and retries recursively.
Supporting refactors and batch detection
include/pingcap/kv/2pc.h, src/kv/2pc.cc
Header adds <shared_mutex> include. Primary-batch swap condition rewritten equivalently. prewriteSingleBatch switches from comparing first key to primary_lock to using batch.is_primary boolean.
Test infrastructure and validation
src/test/real_tikv_test/2pc_test.cc
TestTwoPhaseCommitter constructor reformatted and accessors split across lines. testCommitRollback now asserts txn1.commit() throws on write conflict. New tests: commit empty transaction without exception, retry commit after commit_ts_expired, verify failed prewrite cleans up written locks and later succeeds, validate large-transaction TTL sizing from computed txnSize and lockTTL. Namespace comment updated to pingcap::tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A rabbit hops through two-phase paths,
Where cleanup whispers after wrath,
Lock TTLs now compute just right,
And expired timestamps see new light—
Recovery's hop, a second chance,
In transactional dance! 🎀

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: hardening 2PC core functionality and expanding test coverage, which aligns with the primary objectives of improving cleanup logic, handling edge cases, and adding regression tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 29, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR enhances the 2PC (two-phase commit) implementation to better handle empty transactions, lock TTL sizing based on actual byte size, commit timestamp expiry retries, and cleanup behavior on commit/prewrite failures.

Changes:

  • Compute transaction size in bytes to drive lock TTL decisions; handle empty transactions as a no-op commit.
  • Add retry handling for TiKV CommitTsExpired and introduce cleanup (batch rollback) on certain failures.
  • Extend real TiKV integration tests to cover empty commits, CommitTsExpired retry, cleanup after failed prewrite, and large-TTL sizing.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/test/real_tikv_test/2pc_test.cc Adds real-cluster tests for empty txn commit, commit-ts-expired retry behavior, cleanup after failed prewrite, and TTL sizing based on byte-size.
src/kv/2pc.cc Updates txn sizing to bytes, computes TTL via txnLockTTL, handles empty txns, adds cleanup on failures, and retries CommitTsExpired.
include/pingcap/kv/2pc.h Adds cleanup action plumbing, tracks commit_result_undetermined, and refactors primary-batch detection via BatchKeys::is_primary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/kv/2pc.cc Outdated
Comment on lines +293 to +307
if (response.error().has_commit_ts_expired())
{
const auto & rejected = response.error().commit_ts_expired();
if (rejected.min_commit_ts() > rejected.attempted_commit_ts()
&& rejected.min_commit_ts() - rejected.attempted_commit_ts() > (3600000ULL << pd::physicalShiftBits))
{
throw Exception("2PC MinCommitTS is too large, got MinCommitTS: " + std::to_string(rejected.min_commit_ts())
+ ", AttemptedCommitTS: " + std::to_string(rejected.attempted_commit_ts()),
LockError);
}
commit_ts = cluster->pd_client->getTS();
req.set_commit_version(commit_ts);
commitSingleBatch(bo, batch);
return;
}
Comment thread src/kv/2pc.cc Outdated
{
const auto & rejected = response.error().commit_ts_expired();
if (rejected.min_commit_ts() > rejected.attempted_commit_ts()
&& rejected.min_commit_ts() - rejected.attempted_commit_ts() > (3600000ULL << pd::physicalShiftBits))
Comment thread src/kv/2pc.cc Outdated
}
if (response.has_error())
{
throw Exception("meet cleanup errors: " + response.error().ShortDebugString(), LockError);
Comment thread src/kv/2pc.cc
Comment on lines +116 to +121
ttl_manager.close();
if (commited)
{
// TODO: Rollback keys.
log->warning("write commit exception after primary committed: " + e.displayText());
return;
}
Comment thread src/test/real_tikv_test/2pc_test.cc Outdated
Comment on lines +135 to +149
TestTwoPhaseCommitter committer{&txn};
Backoffer prewrite_bo(prewriteMaxBackoff);
committer.prewriteKeys(prewrite_bo, committer.keys());

// A reader after prewrite pushes the lock's min_commit_ts. Commit with a
// stale commit_ts should be rejected by TiKV with CommitTsExpired, and the
// client should retry the same primary commit with a fresh commit_ts.
Txn reader(test_cluster.get());
auto result = reader.get(key);
ASSERT_EQ(result.second, true);
ASSERT_EQ(result.first, "v0");

committer.setCommitTS(committer.startTS() + 1);
Backoffer commit_bo(commitMaxBackoff);
ASSERT_NO_THROW(committer.commitKeys(commit_bo, committer.keys()));
Signed-off-by: lance6716 <lance6716@gmail.com>
@lance6716 lance6716 force-pushed the lance/txn-core-fixes branch from 3a26c8b to 46bf3d2 Compare May 29, 2026 08:07
@lance6716
Copy link
Copy Markdown
Author

Thanks for the review. I'm an AI agent working under @lance6716's direction.

The comments are reasonable, and I addressed them in the latest push:

  • converted the CommitTsExpired retry path from recursion to a bounded iterative loop;
  • extracted/commented the 1-hour PD TSO skew threshold;
  • clarified the cleanup error message;
  • documented the post-primary-commit success semantics at the 2PC API boundary and added an inline rationale;
  • cached committer.keys() once in the real TiKV regression test.

I also amended the commit with a DCO Signed-off-by trailer and reran the focused real TiKV 2PC regression tests successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants