Skip to content

pd: try_connect_leader panics on None leader during PD pod restart #539

@MananShukla7

Description

@MananShukla7

Description

try_connect_leader in src/pd/cluster.rs calls .unwrap() on previous.leader without guarding against the None case:

let previous_leader = previous.leader.as_ref().unwrap();

This causes the client to panic during a rolling restart of PD pods in a Kubernetes environment, when GetMembersResponse.leader is temporarily None while a new leader is being elected. The panic is unrecoverable and takes down the client process entirely.

Steps to reproduce

  1. Run a TiKV cluster on Kubernetes with multiple PD pods
  2. Connect the Rust client while the cluster is healthy
  3. Trigger a rolling restart of PD pods (kubectl rollout restart)
  4. During the leader election window, the client receives a GetMembersResponse with leader: None
  5. Client panics at previous.leader.as_ref().unwrap()

Panic output

thread 'tokio-runtime-worker' panicked at 'called `Option::unwrap()` on a `None` value'
src/pd/cluster.rs:310

Expected behaviour

The client should return a recoverable Err and allow the caller to retry, rather than panicking and crashing the process.

Suggested fix

Replace .unwrap() with .ok_or_else(), consistent with the style already used elsewhere in this file:

let previous_leader = previous
    .leader
    .as_ref()
    .ok_or_else(|| internal_err!("PD cluster has no leader"))?;

A fix is available in PR #538.

Environment

  • Client: tikv-client (master)
  • Deployment: Kubernetes with multiple PD pods
  • Trigger: PD pod rolling restart / leader election

Metadata

Metadata

Assignees

No one assigned

    Labels

    contributionThis PR is from a community contributor.first-time-contributorIndicates that the PR was contributed by an external member and is a first-time contributor.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions