-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race issue after node reboot #1221
Comments
|
just update doing -f looks like fix the issue in the copy command |
Coincidentally, we also saw this error crop up yesterday with one of our edge clusters after rebooting. |
As an FYI i see different deployment yamls use different way to copy the cni binary in init container: the first one[1] will use although im not sure that copying file atomically will solve the above issue. see: multus-cni/deployments/multus-daemonset.yml Line 207 in 8e5060b
also |
This should hopefully be addressed with #1213 |
Saw this in minikube today. No rebooting, just staring up a new minikube cluster. |
I also got a reproduction after rebooting a node and having multus restart. I mitigated it by deleting
|
Seems I can make this happen anytime I ungracefully restart a node, worker or master it creates this error and stops pod network sandbox recreation completely on that node. The fix mentioned above does work, but this likely means a power outage of a node will require manual intervention whereas otherwise without multus this is not required, this error should be handled properly. |
+1. This seems like a pretty serious issue. Can we get a fix merge for it soon please? |
Additionally can confirm this behavior. as @dougbtv mentioned... removing |
+1 happend to me as well, cluster did not come up. Any chance to fix this soon? |
same here, cluster kubespray 1.29 |
Certainly need to fix right away. |
@dougbtv : Hit exactly the same issue. It helps by deleting /opt/cni/bin/multus-shim. when could this be fixed? |
Hit the same issue with kube-ovn. Already posted it there (kubeovn/kube-ovn#4470) |
Also hit me today on a node that crashed. Any indicator this fix is going to be picked up any time soon? |
Had the same problem today, having a Talos kubernetes cluster. I modified the kube-multus-ds init containers to check for existing multus-shim file Original command
New command
This worked for me 👍 |
thanks @iSenne already use this code in my kubernetes cluster. |
Upgrading multus would mean you have an old shim file if you check for existence |
Hey we are also really blocked by this issue. What can we do to push this forward? |
An immediate mitigation that will get Multus running temporarily is to edit the DaemonSet directly and modify the cp command to add
scroll to the multus-installer initContainer |
I think the concern at this point for folks wanting to use multus, is not about having a "workaround", but the seeming inability to get a fix up-streamed, leading to questions about the health of the multus project. #1213 for example, has been open since Jan 18, and hasn't gotten any comment since Aug 12. Please don't see this comment as knocking the devs hard work. It is very much appreciated, really. Just trying to gauge the health of the project though. |
So crazy this has been ignored by maintainers this long. 🙄 |
FYI: I made a PR to add the |
This has been bothering me for quite some time, whenever I do node maintenance the whole cluster does not come up and I have to
|
Hi, it looks like there is an issue after a node reboot where we can have a race in multus that will prevent the pod from starting
The problem is mainly after reboot that the multus-shim gets called by crio to start pods but the multus pod is not able to start because the init container fails to cp the shim.
The reason it failed to copy is because crio called the shim who is stuck waiting for the communication with the pod
The text was updated successfully, but these errors were encountered: