Skip to content

Commit

Permalink
Wait for 10 minutes before the 5th sr-destroy attempt
Browse files Browse the repository at this point in the history
We already have code which retries a failed sr-destroy, when the cause is
remaining hidden VDIs which haven't been garbage-collected yet, and it's
enough in most cases, but not always.

Sometimes, the force-gc that is triggered by SM before the delete operation
is not enough. VDIs that look like they should be removed remain, and
are removed only a bit later, when the normal garbage collector runs.

This may be related to the fact that the forced GC operation doesn't
coalesce VDIs, or to a special case where a recently coalesced VDI is
temporarily protected against deletion.

We add a 10 minutes wait before the 5th sr-destroy attempt, to give the
normal garbage collector enough time to run (it should run 5 minutes
after the last operation that creates a need for the GC to run).

Signed-off-by: Samuel Verschelde <[email protected]>
  • Loading branch information
stormi committed Sep 28, 2023
1 parent c7448ee commit ebfe49a
Showing 1 changed file with 12 additions and 1 deletion.
13 changes: 12 additions & 1 deletion lib/sr.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import logging
import time

import lib.commands as commands

Expand Down Expand Up @@ -95,7 +96,17 @@ def destroy(self, verify=False, force=False):
else:
logging.info("SR destroy failed due to SR not empty but there aren't any managed VDIs left.")
if i < max_tries:
logging.info(f"Retrying sr-destroy in case it failed due to incomplete GC.")
if i == max_tries - 1:
# We tried already 4 times to destroy the SR, and there still are hidden VDIs that
# couldn't be force-GCed. In this case, we likely need to give time to the normal GC
# to run, which might also coalesce some VDIs if that's what it really needs.
# The GC should kick approximately 5 minutes after the last operation we did, so let's
# give it these 5 minutes plus extra time to complete.
gc_delay = 600
logging.warning(f"SR destroy failed {i} times in a row. "
f"Wait for {gc_delay}s, hoping GC fully runs before next try")
time.sleep(gc_delay)
logging.info("Retrying sr-destroy in case it previously failed due to incomplete GC.")
continue
else:
raise Exception(f"Could not destroy the SR even after {i} attempts.")
Expand Down

0 comments on commit ebfe49a

Please sign in to comment.