From ebfe49ae26768db276bde5023e0e97001cce3528 Mon Sep 17 00:00:00 2001 From: Samuel Verschelde Date: Thu, 28 Sep 2023 13:13:04 +0200 Subject: [PATCH] Wait for 10 minutes before the 5th sr-destroy attempt We already have code which retries a failed sr-destroy, when the cause is remaining hidden VDIs which haven't been garbage-collected yet, and it's enough in most cases, but not always. Sometimes, the force-gc that is triggered by SM before the delete operation is not enough. VDIs that look like they should be removed remain, and are removed only a bit later, when the normal garbage collector runs. This may be related to the fact that the forced GC operation doesn't coalesce VDIs, or to a special case where a recently coalesced VDI is temporarily protected against deletion. We add a 10 minutes wait before the 5th sr-destroy attempt, to give the normal garbage collector enough time to run (it should run 5 minutes after the last operation that creates a need for the GC to run). Signed-off-by: Samuel Verschelde --- lib/sr.py | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/lib/sr.py b/lib/sr.py index e984a07ba..c5fc43b31 100644 --- a/lib/sr.py +++ b/lib/sr.py @@ -1,4 +1,5 @@ import logging +import time import lib.commands as commands @@ -95,7 +96,17 @@ def destroy(self, verify=False, force=False): else: logging.info("SR destroy failed due to SR not empty but there aren't any managed VDIs left.") if i < max_tries: - logging.info(f"Retrying sr-destroy in case it failed due to incomplete GC.") + if i == max_tries - 1: + # We tried already 4 times to destroy the SR, and there still are hidden VDIs that + # couldn't be force-GCed. In this case, we likely need to give time to the normal GC + # to run, which might also coalesce some VDIs if that's what it really needs. + # The GC should kick approximately 5 minutes after the last operation we did, so let's + # give it these 5 minutes plus extra time to complete. + gc_delay = 600 + logging.warning(f"SR destroy failed {i} times in a row. " + f"Wait for {gc_delay}s, hoping GC fully runs before next try") + time.sleep(gc_delay) + logging.info("Retrying sr-destroy in case it previously failed due to incomplete GC.") continue else: raise Exception(f"Could not destroy the SR even after {i} attempts.")