Skip to content

fix: handle case where PD cluster has no leader#538

Open
MananShukla7 wants to merge 1 commit into
tikv:masterfrom
MananShukla7:fix/pd-leader-none-panic
Open

fix: handle case where PD cluster has no leader#538
MananShukla7 wants to merge 1 commit into
tikv:masterfrom
MananShukla7:fix/pd-leader-none-panic

Conversation

@MananShukla7
Copy link
Copy Markdown

@MananShukla7 MananShukla7 commented Jun 3, 2026

Problem

In try_connect_leader, the previous leader is accessed with .unwrap():

let previous_leader = previous.leader.as_ref().unwrap();

This panics when the PD cluster temporarily has no leader — a situation that occurs during a rolling restart of PD pods in a Kubernetes environment. During leader election, GetMembersResponse.leader can be None, causing the client to crash rather than returning a recoverable error.

Fix

Replace .unwrap() with .ok_or_else() to return a descriptive Err instead of panicking. This is consistent with the style already used in this file.

Before:

let previous_leader = previous.leader.as_ref().unwrap();

After:

let previous_leader = previous
    .leader
    .as_ref()
    .ok_or_else(|| internal_err!("PD cluster has no leader"))?;

The same pattern is applied to the resp.leader lookup later in the function, replacing the existing ok_or_else error message with a clearer one.

Reproduction

Trigger a rolling restart of PD pods while the TiKV client is connected. The client panics at previous.leader.as_ref().unwrap() when the new GetMembersResponse arrives with leader: None during leader election.

Checklist

  • cargo fmt --all
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo test
  • DCO sign-off (git commit -s)

Summary by CodeRabbit

  • Bug Fixes
    • Improved cluster connection logic to gracefully handle cases with no leader, returning a clear error instead of panicking, improving stability and observability.
  • Documentation
    • Added explanatory comments describing leader-connection behavior and its non-panicking error semantics.

@ti-chi-bot ti-chi-bot Bot added the dco-signoff: yes Indicates the PR's author has signed the dco. label Jun 3, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 3, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign v01dstar for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added contribution This PR is from a community contributor. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. labels Jun 3, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 3, 2026

Welcome @MananShukla7!

It looks like this is your first PR to tikv/client-rust 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!



Thank you, and welcome to tikv/client-rust. 😃

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

Warning

Review limit reached

@MananShukla7, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 7 minutes and 59 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4e688fd9-aa53-4c81-b2da-3182350b0299

📥 Commits

Reviewing files that changed from the base of the PR and between 546d228 and e21c876.

📒 Files selected for processing (1)
  • src/pd/cluster.rs
📝 Walkthrough

Walkthrough

The PR hardens PD cluster connection logic by converting an unsafe unwrap() on previous.leader into a defensive error path via ok_or_else, and adds documentation describing the leader-connection attempts and non-panicking semantics.

Changes

Missing leader error handling

Layer / File(s) Summary
Error handling for missing PD cluster leader
src/pd/cluster.rs
previous_leader is now derived via ok_or_else when previous.leader is absent, returning internal_err!("PD cluster has no leader") instead of unwrapping and potentially panicking. Documentation for try_connect_leader describing attempt flow and non-panicking behavior was also added.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

A leader once lost in the fray,
Now returns with grace, not dismay,
No unwrap, no crash—
Just errors so rash,
A safer PD cluster today! 🐰

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly summarizes the main change: replacing a panic-inducing unwrap() with proper error handling when PD cluster has no leader.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@MananShukla7 MananShukla7 force-pushed the fix/pd-leader-none-panic branch from 3158fcf to 546d228 Compare June 3, 2026 12:22
@ti-chi-bot ti-chi-bot Bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 3, 2026
@MananShukla7
Copy link
Copy Markdown
Author

/cc @ekexium @andylokandy

@ti-chi-bot ti-chi-bot Bot requested review from andylokandy and ekexium June 3, 2026 12:23
Signed-off-by: MananShukla7 <shuklamanan8@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution This PR is from a community contributor. dco-signoff: yes Indicates the PR's author has signed the dco. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant