-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmdDel fails releasing the device when kubelet deletes pause container #126
Comments
I think the long term solution would be to wait for the kubelet fix. For the workaround:
|
I tried switching to init namespace, however the device is not visible. The device is visible on the host only after cmdDel command is called on all the devices. |
We're running into this currently. We have 4-6 interfaces in use by the CNI, but are often finding 1 or 2 left with a bad interface name, and various settings that weren't reverted. The host usually has enough information to fix the handed back/abandoned interfaces after the failed/incomplete cmdDel. The struggle is we're then racing the cleanup against the pods spinning back up and requesting new interfaces. If they hit one of the abandoned interfaces before cleanup, things go south. Also when something like the Mellanox E-Switch is involved, the host doesn't have enough information to safely nuke entries when MACs are being reused. |
@zshi-redhat |
Do you know why they aren't visible in the init netns? I figured, once the pod netns is deleted, the devices would return to the init netns. SRIOV CNI could detect the pod netns is deleted but continue on and verify the device is in the appropriate state. |
So from within the sriov-cni process when i tried to list devices in the init ns, the devices dont show up. Only after the last cmdDel invocation finishes the devices show up in host ns. ( atleast thats what i remember). |
In our use case the job uses all IB devices for training. If even one device is not healthy the job will not run. |
As a follow up, our post cmdDel() failure cleanup now resets the VF, which causes all E-Switch entries related to that VF to be removed also. This prevents MAC collisions in the E-Switch, as we are changing them for bonding. Since the host namespace only sees VFs that aren't assigned out, we can 'safely' reset all VFs we see without concern about their state. Though we still have the race condition where the released VF in a bad state might be assigned out before we clean it. |
Any updates here? Using the sriov-cni with dhcp ipam seems to exacerbate the issue as well. |
Hi @YitzyD @blackgold question can you share your pod yaml? just to be sure are you using |
Also with container runtime are you using? I try this with crio and I am not able to reproduce the issue after using #220 |
What happened?
Kubelet doesn't gaurentee to keep pause container alive while cni tries to delete all the devices attached to the pod. When the pause container is deleted, the netns is not available to release the device from cmdDel. This results in the device on host with wrong name, missing ip and wrong settings.
What did you expect to happen?
Kubelet to provide some guarantee that netns is available for the cni to delete all attached devices.
What are the minimal steps needed to reproduce the bug?
Attach atleast 4 sriov devices to pod. Kill the pod.
To consistently reproduce the error, add 1 sec sleep in cmdDel.
Anything else we need to know?
Raised the issue with kubernetes and unable to get any positive response.
kubernetes/kubernetes#89440
As a workaround having a daemon that tries to fix the broken device on the host periodically.
Component Versions
Please fill in the below table with the version numbers of applicable components used.
Config Files
Config file locations may be config dependent.
CNI config (Try '/etc/cni/net.d/')
Device pool config file location (Try '/etc/pcidp/config.json')
Multus config (Try '/etc/cni/multus/net.d')
Kubernetes deployment type ( Bare Metal, Kubeadm etc.)
Kubeconfig file
SR-IOV Network Custom Resource Definition
Logs
SR-IOV Network Device Plugin Logs (use
kubectl logs $PODNAME
)Added some custom logs to print cmdArgs and netns
time="2020-04-24T17:22:52Z" level=info msg="read from cache &{NetConf:{CNIVersion:0.3.1 Name:sriov-network Type:sriov Capabilities:map[] IPAM:{Type:} DNS:{Nameservers:[] Domain: Search:[] Options:[]} RawPrevResult:map[dns:map[] interfaces:[map[name:net1 sandbox:/proc/4281/ns/net]]] PrevResult:} DPDKMode:false Master:enp5s0 MAC: AdminMAC: EffectiveMAC: Vlan:0 VlanQoS:0 DeviceID:0000:05:00.1 VFID:0 HostIFNames:net1 ContIFNames:net1 MinTxRate: MaxTxRate: SpoofChk: Trust: LinkState: Delegates:[{CNIVersion:0.3.1 Name:sbr Type:sbr Capabilities:map[] IPAM:{Type:} DNS:{Nameservers:[] Domain: Search:[] Options:[]} RawPrevResult:map[] PrevResult:}] RuntimeConfig:{Mac:} IPNet:}"
time="2020-04-24T17:22:52Z" level=info msg="empty netns , error = failed to Statfs "/proc/4281/ns/net": no such file or directory"
time="2020-04-24T17:22:52Z" level=info msg="ReleaseVF "
time="2020-04-24T17:22:52Z" level=error msg="failed to get netlink device with name net1"
Multus logs (If enabled. Try '/var/log/multus.log' )
Kubelet logs (journalctl -u kubelet)
Mar 23 21:04:42 dgx0098 kubelet[29124]: 2020-03-23T21:04:42Z [error] Multus: error in invoke Delegate del - "sriov": error in removing device from net namespace: 1failed to get netlink device with name net3: Link not found
Mar 23 21:04:42 dgx0098 kubelet[29124]: 2020-03-23T21:04:42Z [debug] delegateDel: , net2, &{{0.3.1 sriov-network sriov map[] {} {[] [] []}} { []} false false [123 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 101 108 101 103 97 116 101 115 34 58 91 123 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 110 9 101 34 58 34 115 98 114 34 44 34 116 121 112 101 34 58 34 115 98 114 34 125 93 44 34 100 101 118 105 99 101 73 68 34 58 34 48 48 48 48 58 48 99 58 48 48 46 49 34 44 34 110 97 109 101 34 58 34 115 114 105 111 118 45 110 101 116 119 111 114 107 34 44 34 116 121 112 101 34 58 34 115 114 105 111 118 34 125]}, &{cfba15035e7ef328153ba5c88853b52f97740560bc27a0707ab2f5b536a8f863 /proc/32764/ns/net net2 [[IgnoreUnknown 1] [K8S_POD_NAMESPACE user] [K8S_POD_NAME 847138-worker-1] [K8S_POD_INFRA_CONTAINER_ID cfba15035e7ef328153ba5c88853b52f97740560bc27a0707ab2f5b536a8f863]] map[] }, /opt/cni/bin
Mar 23 21:04:42 dgx0098 kubelet[29124]: 2020-03-23T21:04:42Z [verbose] Del: user:847138-worker-1:sriov-network:net2 {"cniVersion":"0.3.1","delegates":[{"cniVersion":"0.3.1","name":"sbr","type":"sbr"}],"deviceID":"0000:0c:00.1","name":"sriov-network","type":"sriov"}
Mar 23 21:04:46 dgx0098 kubelet[29124]: I0323 21:04:46.544632 29124 plugins.go:391] Calling network plugin cni to tear down pod "847138-worker-1_user"
The text was updated successfully, but these errors were encountered: