Rechecking pending Pods #195

xagent003 · 2022-02-16T23:54:35Z

This fixes #182

while hopefully not re-breaking #162

We shouldn't treat all pending Pods as "alive" and skip the check. The list of all Pods fetch'd earlier may be stale, and as observed in some scenarios, several seconds before the ip-reconciler does the isPodAlive check.

Instead, can we retry a Get on an individual Pod, with the hopes that it has final IP/network annoations? So refetch the Pod, up to 5 times, if it is Pending state and initial IP check fails. After that, just do the IP matching check like before

xagent003 · 2022-02-17T04:56:57Z

@dougbtv and @maiqueb Can you take a look. I did verify that #162 did not re-break:

xagent@pop-os:~/luigi/samples/macvlan$ kubectl logs ip-reconciler-27417868-ldhrq
2022-02-17T04:28:03Z [debug] NewReconcileLooper - Kubernetes config file located at: /host/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig
2022-02-17T04:28:04Z [debug] successfully read the kubernetes configuration file located at: /host/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig
2022-02-17T04:28:04Z [debug] listing IP pools
2022-02-17T04:28:04Z [debug] Added IP 10.128.165.32 for pod default/web-0
2022-02-17T04:28:04Z [error] could not parse network annotation  for pod: default/web-1; error: unexpected end of JSON input
2022-02-17T04:28:04Z [debug] the IP reservation: IP: 10.128.165.32 is reserved for pod: default/web-0
2022-02-17T04:28:04Z [debug] pod reference default/web-0 matches allocation; Allocation IP: 10.128.165.32; PodIPs: map[10.128.165.32:{}]
2022-02-17T04:28:04Z [debug] the IP reservation: IP: 10.128.165.33 is reserved for pod: default/web-1
2022-02-17T04:28:04Z [debug] pod reference default/web-1 matches allocation; Allocation IP: 10.128.165.33; PodIPs: map[]
2022-02-17T04:28:04Z [debug] Re-fetching Pending Pod: default/web-1 IP-to-match: 10.128.165.33
2022-02-17T04:28:05Z [debug] Added IP 10.128.165.33 for pod default/web-1
2022-02-17T04:28:05Z [debug] Got refreshed pod: &{map[10.128.165.33:{}] Pending}
2022-02-17T04:28:05Z [debug] pod reference default/web-1 matches allocation; Allocation IP: 10.128.165.33; PodIPs: map[10.128.165.33:{}]
2022-02-17T04:28:05Z [debug] Pod now has IP annotation while in Pending
2022-02-17T04:28:05Z [debug] pod reference default/web-0 matches allocation; Allocation IP: 10.128.165.32; PodIPs: map[10.128.165.32:{}]
2022-02-17T04:28:05Z [debug] pod reference default/web-1 matches allocation; Allocation IP: 10.128.165.33; PodIPs: map[]
2022-02-17T04:28:05Z [debug] Re-fetching Pending Pod: default/web-1 IP-to-match: 10.128.165.33
2022-02-17T04:28:05Z [debug] Added IP 10.128.165.33 for pod default/web-1
2022-02-17T04:28:05Z [debug] Got refreshed pod: &{map[10.128.165.33:{}] Pending}
2022-02-17T04:28:05Z [debug] pod reference default/web-1 matches allocation; Allocation IP: 10.128.165.33; PodIPs: map[10.128.165.33:{}]
2022-02-17T04:28:05Z [debug] Pod now has IP annotation while in Pending
2022-02-17T04:28:05Z [debug] no IP addresses to cleanup
xagent@pop-os:~/luigi/samples/macvlan$ kubectl get ippools.whereabouts.cni.cncf.io 10.128.165.0-24 -o yaml
apiVersion: whereabouts.cni.cncf.io/v1alpha1
kind: IPPool
metadata:
  creationTimestamp: "2022-02-16T07:41:55Z"
  generation: 90
  managedFields:
  - apiVersion: whereabouts.cni.cncf.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        .: {}
        f:allocations:
          .: {}
          f:32:
            .: {}
            f:id: {}
            f:podref: {}
          f:33:
            .: {}
            f:id: {}
            f:podref: {}
          f:34:
            .: {}
            f:id: {}
            f:podref: {}
        f:range: {}
    manager: whereabouts
    operation: Update
    time: "2022-02-17T04:28:08Z"
  name: 10.128.165.0-24
  namespace: default
  resourceVersion: "183433"
  uid: 41f5ab5f-2b34-4a5a-94ba-71c604c0a1d2
spec:
  allocations:
    "32":
      id: f7da88514327587cdd23fb0c5d7596341d2d422dea75fd7650c82fd229eb8051
      podref: default/web-0
    "33":
      id: 286db456fecf6f516721347ba004e82a823e9e5768984340ff3b44fdfa0f3c67
      podref: default/web-1
    "34":
      id: 919134bbeb8a0b1128724bf280184b48d3f3bd35aaa7bae8727324b259cb11fb
      podref: default/web-2
  range: 10.128.165.0/24

As you can see, Pod web-1 was missing IP annotation and in Pending phase. I refreshed the Pod, this time it had the IP annotation the first time around, and found a match. So no bad cleanup happened.

xagent003 · 2022-02-17T07:31:37Z

Ok, I verified that it also fixes #182

Shut down a node with a STS and network with full IPs. Force deleted the Pod, so it gets rescheduled. It remains in ContainerCreating state with "Could not allocate IP in range: ip: 10.128.165.32 / - 10.128.165.33" error

Then the reconciler runs. It cleans up the stale IP (belonging to the Pod I force deleted from the dead node), despite the new replica with same podRef being in ContainerCreating state aka Pending phase. Now the new replica can create and moves to Running

seen in ip-reconciler logs:

2022-02-17T07:01:05Z [debug] Going to update the reserve list to: [IP: 10.128.165.32 is reserved for pod: default/web-0]
2022-02-17T07:01:05Z [debug] PF9: patch = [{remove /spec/allocations/33 <nil>}]
2022-02-17T07:01:05Z [debug] PF9: wrote patchdata: [{"op":"test","path":"/metadata/resourceVersion","value":"204380"},{"op":"remove","path":"/spec/allocations/33"}]
2022-02-17T07:01:05Z [debug] successfully cleanup IPs: [10.128.165.33]

verified the IP Pools:
old reservation:

spec:
  allocations:
    "32":
      id: 1e62cb66ff77fbf409633ca4cef7b2b641427c347eb2269cd23efaaba0bb1e00
      podref: default/web-0
    "33":
      id: 40d15102db0b75cad2faff2ccee483bd4e6c25fdd477af9caa2e97b71423b5f7
      podref: default/web-1
  range: 10.128.165.0/24

vs new one:

spec:
  allocations:
    "32":
      id: 1e62cb66ff77fbf409633ca4cef7b2b641427c347eb2269cd23efaaba0bb1e00
      podref: default/web-0
    "33":
      id: 3d6225ea764ed6e3d5650110725c62aa41a22c935c9949ef1572b03c1f7e6bfd
      podref: default/web-1
  range: 10.128.165.0/24

So this fixes both new issue #182, without re-breaking the fix for #162 (skip pods in Pending phase)

maiqueb

Thanks for this patch; while it seems I'm making plenty of comments, I'm very happy with the quality of the code.

I wonder about the impacts of this retry on the code - especially since the retries are nearly immediate (no backoff is used).

I honestly don't expect this scenario to happen (famous last words ...), but if the cronjob is triggered and there are plenty of pending pods (and staying in pending state for long), we will overload the API.

maiqueb · 2022-02-17T09:48:40Z

pkg/reconciler/iploop.go

-				livePodIPs)
-			_, isFound := livePodIPs[ip]
-			return isFound || livePod.phase == v1.PodPending
+			isFound := checkIpOnPod(&livePod, podRef, ip)


Hmmmm, I'd rename the method - not 100% sure what to.

It might be personal, but I'm very used to prefixing methods that return booleans with is . What do you thing about isIPonPod(...) ?

maiqueb · 2022-02-17T09:49:13Z

pkg/reconciler/iploop.go

-			return isFound || livePod.phase == v1.PodPending
+			isFound := checkIpOnPod(&livePod, podRef, ip)
+
+			if !isFound && (livePod.phase == v1.PodPending) {


nit: could make sense to export this into an helper func.

maiqueb · 2022-02-17T09:50:33Z

pkg/reconciler/iploop.go

+				 * so re-fetch the Pod 5x
+				 */
+				var podToMatch *podWrapper = &livePod
+				var retries int = 0


Isn't 0 the default ?

nit: I personally would prefer if the variables were initialized using := . Just more used to it.

maiqueb · 2022-02-17T10:07:08Z

pkg/reconciler/iploop.go

+				for retries < storage.PodRefreshRetries {
+					retries += 1
+					podToMatch := rl.refreshPod(livePodRef)
+					if podToMatch == nil {
+						logging.Debugf("Cleaning up...")
+						return false
+					} else if podToMatch.phase != v1.PodPending {
+						logging.Debugf("Pending Pod is now in phase: %s", podToMatch.phase)
+						break
+					} else {
+						isFound = checkIpOnPod(podToMatch, podRef, ip)
+						// Short-circuit - Pending Pod may have IP now
+						if isFound {
+							logging.Debugf("Pod now has IP annotation while in Pending")
+							return true
+						}
+					}
+				}
+				isFound = checkIpOnPod(podToMatch, podRef, ip)


Would it make sense to use exponential backoff here ?

Also, I think this could be re-written with k8s primitives:

import "k8s.io/apimachinery/pkg/util/wait" timeout := time.Second pollInterval := 200*time.Millisecond // 5 * 0.2 = 1 // PollImmediate tries a condition func until it returns true, an error, or the timeout // is reached. // ... waitErr := wait.PollImmediate(pollInterval, timeout, func() (bool, error) { pod := rl.refreshPod(livePodRef) if podToMatch == nil { logging.Debugf("Cleaning up...") return false, fmt.Errorf("pod is gone") } else if isPodPending(pod) { logging.Debugf("Pending Pod is now in phase: %s", podToMatch.phase) return true, nil } else { return isIPonPod(pod, podRef, ip), nil // I'm not entirely sure I did not oversimplify this ... } })

I am hesitant to use a backoff because remember that for #182 when the IP Pool is full, we WANT it to reconcile. In fact, we dont need any retries in that case... because the container will never come up otherwise. It is in Pending state, does not have an IP annotation either, and will match the podRef.

So adding more retries or backoff would just add load or delay the reconciler (and thus blocked Pod) longer.

Any suggestions on how to distinguish between these two scenarios? Or should we just retry once

maiqueb · 2022-02-17T10:12:41Z

pkg/reconciler/iploop.go

+	var namespace string
+	var podName string


nit: I don't see a reason for declaring the variables here, you can just := below.

Or even - once you know len(namespacedName) == 2 - return directly: return namespacedName[0], namespacedName[1].

not sure what I wrote is easy to understand, I mean something like:

func splitPodRef(podRef string) (string, string) { namespacedName := strings.Split(podRef, "/") if len(namespacedName) != 2 { logging.Errorf("Failed to split podRef %s", podRef) return "", "" } return namespacedName[0], namespacedName[1] }

maiqueb · 2022-02-17T10:13:43Z

pkg/storage/storage.go

@@ -13,7 +13,8 @@ var (
 	RequestTimeout = 10 * time.Second

 	// DatastoreRetries defines how many retries are attempted when updating the Pool
-	DatastoreRetries = 100
+	DatastoreRetries  = 100
+	PodRefreshRetries = 5


this might not be needed if we end up using wait.PollImmediate or wait.ExponentialBackoff .

maiqueb · 2022-02-17T10:20:46Z

pkg/storage/kubernetes/client.go

+	pod, err := i.clientSet.CoreV1().Pods(namespace).Get(context.TODO(), name, metav1.GetOptions{})
+	if err != nil {
+		return nil, err
+	}
+
+	return pod, nil


hm, maybe I'm missing something, but why don't you just return i.clientSet.CoreV1().Pods(namespace).Get(context.TODO(), name, metav1.GetOptions{}) ?

xagent003 · 2022-02-18T22:20:53Z

@maiqueb I'm just going to remove all retries altogether. No backoff.

That means more uncessary API calls and longer time to reconcile for #182 . In that case the Pod is Pending, has no IP annotation, and matches the podRef... but it will never come up until the old IP is reconciled since IP pool is full.

The retry is only for #162 which is a rarer bug to hit (ip-reconciler has to run at the same time as Pods coming up, and Pending Pod has no IP annotation YET - as it is still in process of coming up). So just refetching the individual Pod once should be good enough, as it will make an uncommon scenario even more rare.

I tried a few times and when it is hit, the Pod always has the IP on the first GET call.

xagent003 · 2022-03-30T03:47:15Z

@dougbtv @maiqueb Can you please take a look, revised the PR. The test failure looks unrelated, and same as my PR #180 is hitting.

Can you re-run/override?

maiqueb · 2022-03-31T08:33:55Z

@dougbtv @maiqueb Can you please take a look, revised the PR. The test failure looks unrelated, and same as my PR #180 is hitting.

Can you re-run/override?

Would you rebase both #211 (which fixes the unit test issue you're facing) ? We cannot override, since this is not executing any unit tests.

maiqueb · 2022-04-01T11:01:37Z

pkg/reconciler/iploop.go

+
+				for retries < storage.PodRefreshRetries {
+					retries += 1
+					podToMatch = rl.refreshPod(livePodRef)


would it make sense to instead of retrying now just put the pods in pending state to a list, and after the main loop (that handles all pods that are not pending) iterate all the ones that were in pending state ?

I am unsure if I am overthinking this.

dougbtv · 2022-06-23T14:00:44Z

Hey @xagent003 any chance we could see some tests that show this in action? With just one removed test, I'm not quite sure on it. Also, maybe some improvements to the reconciliation controller in the meanwhile.

xagent003 force-pushed the private/arjun/recheckPendingPod branch 3 times, most recently from d8e231e to 1edaca7 Compare February 17, 2022 04:45

maiqueb reviewed Feb 17, 2022

View reviewed changes

xagent003 force-pushed the private/arjun/recheckPendingPod branch from 1edaca7 to 72eb8ea Compare March 29, 2022 23:11

Rechecking pending Pods

69b4616

xagent003 force-pushed the private/arjun/recheckPendingPod branch from 72eb8ea to 69b4616 Compare March 30, 2022 03:39

maiqueb reviewed Apr 1, 2022

View reviewed changes

dougbtv mentioned this pull request Aug 3, 2023

[WIP] Rechecking pending Pods (conflict resolved) #372

Closed

This was referenced Aug 4, 2023

OCPBUGS-16008: Reconciler pending fix [WIP] openshift/whereabouts-cni#170

Closed

Rechecking pending Pods (conflict resolved) #375

Merged

nicklesimba mentioned this pull request Sep 12, 2023

OCPBUGS-18893: Rechecking pending Pods (conflict resolved) openshift/whereabouts-cni#196

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rechecking pending Pods #195

Rechecking pending Pods #195

xagent003 commented Feb 16, 2022 •

edited

Loading

xagent003 commented Feb 17, 2022

xagent003 commented Feb 17, 2022 •

edited

Loading

maiqueb left a comment

maiqueb Feb 17, 2022

maiqueb Feb 17, 2022

maiqueb Feb 17, 2022

maiqueb Feb 17, 2022

xagent003 Feb 17, 2022

maiqueb Feb 17, 2022

maiqueb Feb 17, 2022

maiqueb Feb 17, 2022

xagent003 commented Feb 18, 2022 •

edited

Loading

xagent003 commented Mar 30, 2022

maiqueb commented Mar 31, 2022

maiqueb Apr 1, 2022

dougbtv commented Jun 23, 2022

Rechecking pending Pods #195

Are you sure you want to change the base?

Rechecking pending Pods #195

Conversation

xagent003 commented Feb 16, 2022 • edited Loading

xagent003 commented Feb 17, 2022

xagent003 commented Feb 17, 2022 • edited Loading

maiqueb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xagent003 commented Feb 18, 2022 • edited Loading

xagent003 commented Mar 30, 2022

maiqueb commented Mar 31, 2022

Choose a reason for hiding this comment

dougbtv commented Jun 23, 2022

xagent003 commented Feb 16, 2022 •

edited

Loading

xagent003 commented Feb 17, 2022 •

edited

Loading

xagent003 commented Feb 18, 2022 •

edited

Loading