Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebases #180: ip-reconciler: Use ContainerID instead of PodRef #399

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dougbtv
Copy link
Member

@dougbtv dougbtv commented Dec 20, 2023

This is a (somewhat manual) rebase of #180 --

It looks like it's working in my testing of it, but... I initially had some incorrect keys deleted only sometimes when I was first testing it, and getting the wrong IPs deleted?

So I was getting failures like:

Whereabouts IP reconciler
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:34
  reconciling an IP pool with multiple pods attached
  /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:98
    each with IP from the same IPPool
    /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:122
      reconciling the IPPool
      /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:139
        should report the dead pod's IP address as deleted [It]
        /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:146

        Expected
            <[]net.IP | len:1, cap:1>: [
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 10, 10, 10, 2],
            ]
        to equal
            <[]net.IP | len:1, cap:1>: [
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 10, 10, 10, 1],
            ]

        /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:150

However, with whatever finagling I was doing, those have gone away... Or so I think.

So I've been testing it in a loop to repeat it a bunch of times, like:

#!/bin/bash

for i in {1..20}; do
    echo "Running test-go.sh - Attempt $i"
    ./hack/test-go.sh > /tmp/test_output_$i.txt 2>&1
    if [ $? -ne 0 ]; then
        echo "Test failed on attempt $i, output saved to /tmp/test_output_$i.txt"
    else
        rm /tmp/test_output_$i.txt
        echo "Test passed on attempt $i"
    fi
done

@s1061123
Copy link
Member

I understand that the PR changes whereabout to change the identifier of allocation from podRef(Pod namespace/name) to contianer id. So how about to change reconciler and other code to match between Pod and allocation pool, i.e. garbageCollectPodIPs():pkg/controlloop/pod.go?

@dougbtv
Copy link
Member Author

dougbtv commented Dec 21, 2023

Tomo not sure I'm following... This is to change the identifier during reconciliation. Do you mean to change the identifier to pod and allocation pool? I'm probably being dense

@dougbtv
Copy link
Member Author

dougbtv commented Dec 21, 2023

I actually ran this 50 times and got 3 failures...

One I believe was a flake with a timeout starting the network controller, and then these two errors:

------------------------------
• Failure [0.001 seconds]
Whereabouts IP reconciler
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:37
  reconciling an IP pool with multiple pods attached
  /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:101
    each with IP from the same IPPool
    /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:125
      reconciling the IPPool
      /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:142
        the IPPool should have only the IP reservation of the live pod [It]
        /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:157

        Mismatch at key: 1
        Expected
            <string>: default/pod1
        to equal
            <string>: 

        /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:172

and

------------------------------
• Failure [1.003 seconds]
IPControlLoop
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:37
  a running pod on a node
  /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:61
    IPPool featuring an allocation for the pod
    /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:116
      the network attachment is available
      /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:127
        when the associated pod is deleted
        /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:160
          stale IP addresses are garbage collected [It]
          /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:165

          Timed out after 1.001s.
          the ip control loop should have removed this stale address
          Expected
              <map[string]v1alpha1.IPAllocation | len:1>: {
                  "0": {
                      ContainerID: "",
                      PodRef: "default/tiny-winy-pod",
                  },
              }
          to be empty

          /home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:170

@s1061123
Copy link
Member

This is to change the identifier during reconciliation. Do you mean to change the identifier to pod and allocation pool? I'm probably being dense
Should it be changed?

if allocation.PodRef == podID(podNamespace, podName) {

@dougbtv
Copy link
Member Author

dougbtv commented Dec 21, 2023

So, in a discussion with Tomo, he demonstrated that under crio, the containerid doesn't match the pod sandbox, so, this won't work under all runtimes. So we'll need a different approach for having a uniqueid to delete on for all container runtimes.

@zshi-redhat
Copy link
Contributor

So, in a discussion with Tomo, he demonstrated that under crio, the containerid doesn't match the pod sandbox, so, this won't work under all runtimes. So we'll need a different approach for having a uniqueid to delete on for all container runtimes.

I discussed with Tomo, it seems the containerID and sandboxID is the same on openshift with crio (I tested it on 4.14 with this PR applied)

@s1061123
Copy link
Member

s1061123 commented Feb 1, 2024

I discussed with Tomo, it seems the containerID and sandboxID is the same on openshift with crio (I tested it on 4.14 with this PR applied)

Let me correct that. ContainerID and sandboxID is different in crio/OpenShift, but @zshi-redhat test/verified that the PR fixes the issue.

@mlguerrero12
Copy link
Collaborator

mlguerrero12 commented Feb 1, 2024

Hi all, I had a look at this and I think this fix is not complete. First of all, the issue happens when within a pool there are 2 entries (different ips) for the same pod reference. When the ip-control-loop runs, the method findOrphanedIPsPerPool can return different values based on the order of the entries (ipReservations) of an allocation and list of pods (listing pods is not ordered). So, the function isPodAlive within findOrphanedIPsPerPool will fail for both entries if the pod with unknown status is listed first. I fixed this with some small changes in isPodAlive

func (rl ReconcileLooper) isPodAlive(podRef string, ip string) bool {
	for livePodRef, livePod := range rl.liveWhereaboutsPods {
		if podRef == livePodRef {
			isFound := isIpOnPod(&livePod, podRef, ip)
			if isFound {
				return true
			} else if !isFound && (livePod.phase == v1.PodPending) {
				/* Sometimes pods are still coming up, and may not yet have Multus
				 * annotation added to it yet. We don't want to check the IPs yet
				 * so re-fetch the Pod 5x
				 */
				podToMatch := &livePod
				retries := 0

				logging.Debugf("Re-fetching Pending Pod: %s IP-to-match: %s", livePodRef, ip)

				for retries < storage.PodRefreshRetries {
					retries += 1
					podToMatch = rl.refreshPod(livePodRef)
					if podToMatch == nil {
						logging.Debugf("Cleaning up...")
						return false
					} else if podToMatch.phase != v1.PodPending {
						logging.Debugf("Pending Pod is now in phase: %s", podToMatch.phase)
						return isIpOnPod(podToMatch, podRef, ip)
					} else {
						isFound = isIpOnPod(podToMatch, podRef, ip)
						// Short-circuit - Pending Pod may have IP now
						if isFound {
							logging.Debugf("Pod now has IP annotation while in Pending")
							return true
						}
						time.Sleep(time.Duration(500) * time.Millisecond)
					}
				}
			}
		}
	}
	return false
}

Regarding the use of container ID, I think it's a bit weird that the person that wrote ReconcileIPPools uses podRef instead of ContainerID given that this function call IterateForDeallocation which is also used by the pod controller at the moment of deallocating. In fact, the argument of this function explicitly says containerID. Perhaps, there is a good reason for this. In any way, given that some runtimes do not have a 1 to 1 correspondence between container ID and sandbox ID, we could use the IP (together with the podRef) to identify an allocation. The Allocations are store in map whose index is the offset of the reserved IP with respect to the network range. This offset is used to calculate the IP of IPReservation (done in toIPReservationList). So, I modified ReconcileIPPools based on this. Note that I didn't modify IterateForDeallocation to not affect the pod controller but perhaps we should. It makes sense also there.

func (rl ReconcileLooper) ReconcileIPPools(ctx context.Context) ([]net.IP, error) {
	matchByPodRefAndIp := func(reservations []types.IPReservation, podRef string, ip net.IP) int {
		foundidx := -1
		for idx, v := range reservations {
			if v.PodRef == podRef && v.IP.String() == ip.String() {
				return idx
			}
		}
		return foundidx
	}

	var totalCleanedUpIps []net.IP

	for _, orphanedIP := range rl.orphanedIPs {
		currentIPReservations := orphanedIP.Pool.Allocations()
		for _, allocation := range orphanedIP.Allocations {
			pos := matchByPodRefAndIp(currentIPReservations, allocation.PodRef, allocation.IP)
			if pos < 0 {
				// Should never happen
				return nil, fmt.Errorf("did not find reserved IP for container %v", allocation.PodRef)
			}

			// Delete entry
			currentIPReservations[pos] = currentIPReservations[len(currentIPReservations)-1]
			currentIPReservations = currentIPReservations[:len(currentIPReservations)-1]

			logging.Debugf("Going to update the reserve list to: %+v", currentIPReservations)
			if err := orphanedIP.Pool.Update(ctx, currentIPReservations); err != nil {
				return nil, logging.Errorf("failed to update the reservation list: %v", err)
			}
			totalCleanedUpIps = append(totalCleanedUpIps, allocation.IP)
		}
	}

	return totalCleanedUpIps, nil
}

I tested all this in my setup and wrote a unit test as well. Disclaimer: the fake k8sclient do not support to have 2 pods with the same name. Only the last one is stored. Other scenarios were tested by manually changing the order.

	Context("reconciling an IP pool with entries from the same pod reference", func() {
		var wbClient wbclient.Interface

		var podRunning *v1.Pod
		var podTerminated *v1.Pod

		It("verifies that the entry for the pod in unknown state is the only one deleted", func() {
			By("creating 2 pods with the same name and namespace. 1st pod without network-status annotation and in an unknown state")
			podTerminated = generatePod(namespace, podName, ipInNetwork{})
			podTerminated.Status.Phase = v1.PodUnknown
			podTerminated.ObjectMeta.ResourceVersion = "10"
			k8sClientSet = fakek8sclient.NewSimpleClientset(podTerminated)

			podRunning = generatePod(namespace, podName, ipInNetwork{ip: firstIPInRange, networkName: networkName})
			k8sClientSet = fakek8sclient.NewSimpleClientset(podRunning)

			By("creating an IP pool with 2 entries from the same pod. First entry corresponds to a previous pod stuck in terminating state")
			pool := generateIPPoolSpec(ipRange, namespace, podName)
			pool.Spec.Allocations = map[string]v1alpha1.IPAllocation{
				"1": {
					PodRef: fmt.Sprintf("%s/%s", namespace, podName),
				},
				"2": {
					PodRef: fmt.Sprintf("%s/%s", namespace, podName),
				},
			}
			wbClient = fakewbclient.NewSimpleClientset(pool)

			By("initializing the reconciler")
			var err error
			reconcileLooper, err = NewReconcileLooperWithClient(context.TODO(), kubernetes.NewKubernetesClient(wbClient, k8sClientSet, timeout), timeout)
			Expect(err).NotTo(HaveOccurred())

			By("reconciling and checking that the correct entry is deleted")
			deletedIPAddrs, err := reconcileLooper.ReconcileIPPools(context.TODO())
			Expect(err).NotTo(HaveOccurred())
			Expect(deletedIPAddrs).To(Equal([]net.IP{net.ParseIP("10.10.10.2")}))

			By("verifying the IP pool")
			poolAfterCleanup, err := wbClient.WhereaboutsV1alpha1().IPPools(namespace).Get(context.TODO(), pool.GetName(), metav1.GetOptions{})
			Expect(err).NotTo(HaveOccurred())
			remainingAllocation := map[string]v1alpha1.IPAllocation{
				"1": {
					PodRef: fmt.Sprintf("%s/%s", namespace, podName),
				},
			}
			Expect(poolAfterCleanup.Spec.Allocations).To(Equal(remainingAllocation))
		})
	})

@dougbtv , @s1061123 , @zshi-redhat. Please let me know if this makes sense to you.

@pliurh
Copy link

pliurh commented Mar 6, 2024

@mlguerrero12 I proposed a new PR to address this issue. PTAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants