New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Windows] CNI Server installs OpenFlow entries by PortStatus #6763

Open

wenyingd wants to merge 1 commit into antrea-io:main from wenyingd:async_ofport

+731 −73

Contributor

wenyingd commented Oct 23, 2024 •

edited

Loading

This change has introduced podIfaceMonitor in CNI Server module, which listens for the OpenFlow PortStatus message when a new OpenFlow port is allocated in OVS. After receiving the message, antrea-agent Windows will install Pod related OpenFlow entries. If the OpenFlow port is not allocated within 30s after the CmdAdd request is responded, an annotation is added on the Pod with key "pod.antrea.io/not-ready". The annotation is removed if the Pod network is ready.

Fix: #6721

wenyingd requested review from antoninbas, tnqn and XinShuYang

October 23, 2024 02:00

wenyingd force-pushed the async_ofport branch from 9c74fdb to 370e578 Compare

October 23, 2024 03:10


          [Windows] CNI Server installs OpenFlow entries after PortStatus messa…

4fcbe6f

…ge is received

Signed-off-by: Wenying Dong <[email protected]>

wenyingd force-pushed the async_ofport branch from 370e578 to 4fcbe6f Compare

October 23, 2024 03:45

antoninbas reviewed

View reviewed changes

pkg/agent/cniserver/pod_configuration_windows.go

+              	createTime   time.Time
+              }
+              type podIfaceMonitor struct {

Contributor

antoninbas Oct 23, 2024

when I look at all this new code, it seems that there is nothing platform-specific about it: it could build and run on Linux just as well as on Windows (unless I am missing something). Given this, could we move the podInfaceMonitor code to a separate file or sub-package, that will not be platform-specific. It will make it much easier to update that code and unit test it locally on macOS laptops or Linux dev servers. I think this code deserved to be unit-tested independently of the rest of the pod configuration code.

On Windows, we can use a pointer to an actual instance of podIfaceMonitor, and on Linux, we can use a nil pointer.

pkg/agent/cniserver/pod_configuration_windows.go

+              	}
+              }
+              func (m *podIfaceMonitor) monitorUnReadyInterface(stopCh <-chan struct{}) {

Contributor

antoninbas Oct 23, 2024

typically by convention we name a function like this Run, and we call Run in its own goroutine

pkg/agent/cniserver/pod_configuration_windows.go

+              func (m *podIfaceMonitor) updatePodFlows(ifName string, ofPort int32) error {
+              	ifConfig, found := m.ifaceStore.GetInterfaceByName(ifName)
+              	if !found {
+              		klog.Info("Interface config is not found", "name", ifName)

Contributor

antoninbas Oct 23, 2024

you probably meant to use klog.InfoS here. Please check other occurrences as well.

pkg/agent/cniserver/pod_configuration_windows.go

+              	m.ifaceStore.UpdateInterface(ifConfig)
+              	// Install OpenFlow entries for the Pod.
+              	klog.V(2).Infof("Setting up Openflow entries for container %s", containerID)

Contributor

antoninbas Oct 23, 2024

Please make sure you use structured logging (klog.InfoS / klog.ErrorS) in all new code

pkg/agent/cniserver/pod_configuration_windows.go

+              	return nil
+              }
+              func (m *podIfaceMonitor) updatePodUnreadyAnnotation(podNamespace, podName string, addAnnotation bool) {

Contributor

antoninbas Oct 23, 2024

I think this function would benefit from having an actual context.Context as a parameter

It's true that when we started the Antrea codebase, contexts were not as much of a thing, so we have a lot of "placeholders" (context.TODO() or context.Background()). For this new function, we should add ctx context.Context, and the caller can choose what to do.

pkg/agent/cniserver/pod_configuration_windows.go

+              				"annotations": map[string]interface{}{types.PodNotReadyAnnotationKey: ""},
+              			},
+              		})
+              		m.kubeClient.CoreV1().Pods(podNamespace).Patch(context.Background(), podName, apitypes.MergePatchType, patch, metav1.PatchOptions{})

Contributor

antoninbas Oct 23, 2024

should we check the error?

pkg/agent/cniserver/pod_configuration_windows.go

+              }
+              func (m *podIfaceMonitor) updatePodUnreadyAnnotation(podNamespace, podName string, addAnnotation bool) {
+              	pod, err := m.kubeClient.CoreV1().Pods(podNamespace).Get(context.Background(), podName, metav1.GetOptions{})

Contributor

antoninbas Oct 23, 2024

usually we avoid reading directly from the K8s API. Is using an informer possible here?

pkg/agent/cniserver/pod_configuration_windows.go

+              		podInfo := value.(*unReadyPodInfo)
+              		if !podInfo.annotated && time.Now().Sub(podInfo.createTime).Seconds() > podNotReadyTimeInSeconds {
+              			m.updatePodUnreadyAnnotation(podInfo.podNamespace, podInfo.podName, true)
+              			podInfo.annotated = true

Contributor

antoninbas Oct 23, 2024

I think this should be conditional: patching the Pod can fail and we want to be able to try again at the next tick.

pkg/agent/cniserver/pod_configuration_windows.go

+              }
+              func (m *podIfaceMonitor) updateUnReadyPod(status *openflow15.PortStatus) {
+              	ovsPort := string(bytes.Trim(status.Desc.Name, "\x00"))

Contributor

antoninbas Oct 23, 2024

It feels like this should be handled in libOpenflow?

pkg/agent/cniserver/pod_configuration_windows.go

Comment on lines +230 to +233

+              	if err != nil {
+              		klog.ErrorS(err, "Failed to get Pod when trying to update 'unready' annotations", "Namespace", podNamespace, "Name", podName)
+              		return
+              	}

Contributor

antoninbas Oct 23, 2024

Is it possible for a Pod to no longer exist and still be in the map (because of some failure)? In this case we would never delete it from the map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet