test: retry RBD pool disable on transient mirror health flap#762
Open
UtkarshBhatthere wants to merge 1 commit into
Open
test: retry RBD pool disable on transient mirror health flap#762UtkarshBhatthere wants to merge 1 commit into
UtkarshBhatthere wants to merge 1 commit into
Conversation
ae0297d to
90ae630
Compare
The "Disable RBD mirror" step runs remote_disable_rbd_mirroring right after remote_failover_to_siteb. Failover promotes the secondary and triggers a resync, during which pool mirror health transiently flaps to WARNING. The daemon re-validates pool health on every non-forced pool-level disable (replication_rbd.go: "pool replication status not OK"), so a disable issued during that window exits 1 and fails the step. PR #696 added a single up-front remote_wait_for_rbd_mirror_health call, but that is a TOCTOU check: health can flap back after the wait, and each disable perturbs health, so the one-shot wait cannot cover the per-operation re-validation. Add rbd_disable_retry_transient_health, which retries a pool disable only while it is rejected with "status not OK", and route the three pool-level disables through it. It returns immediately on success or on any other outcome, so the negative test still observes the expected "in Image mirroring mode" guard. Image-level disables are not health gated and are left unchanged. Assisted-by: hermes:claude-opus-4.8 Signed-off-by: Utkarsh Bhatt <utkarsh_bhatt@outlook.com>
90ae630 to
f491622
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The "Disable RBD mirror" step runs remote_disable_rbd_mirroring right
after remote_failover_to_siteb. Failover promotes the secondary and
triggers a resync, during which pool mirror health transiently flaps to
WARNING. The daemon re-validates pool health on every non-forced
pool-level disable (replication_rbd.go: "pool replication status not
OK"), so a disable issued during that window exits 1 and fails the step.
PR #696 added a single up-front remote_wait_for_rbd_mirror_health call,
but that is a TOCTOU check: health can flap back after the wait, and each
disable perturbs health, so the one-shot wait cannot cover the
per-operation re-validation.
Add rbd_disable_retry_transient_health, which retries a pool disable only
while it is rejected with "status not OK", and route the three
pool-level disables through it. It returns immediately on success or on
any other outcome, so the negative test still observes the expected "in
Image mirroring mode" guard. Image-level disables are not health gated
and are left unchanged.
Assisted-by: hermes:claude-opus-4.8