upgrading to runc 1.1.6 / 1.1.7 breaks #3223

BenTheElder · 2023-05-12T16:18:51Z

See: #3220, https://kubernetes.slack.com/archives/CEKK1KTN2/p1683851267796889

#3221 and #3222 have test results.

The failure mode is like:

{ failed [FAILED] failed to run command '/agnhost dns-suffix' on pod, stdout: , stderr: , err: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "69cc58960cac2f68567eec2a84011360392a649b12c91f955d6ddfeb6e34b80a": OCI runtime exec failed: exec failed: unable to start container process: error adding pid 78936 to cgroups: failed to write 78936: openat2 /sys/fs/cgroup/unified/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-besteffort.slice/kubelet-kubepods-besteffort-pod60faed4d_72a9_453e_be90_24e96a46d7e5.slice/cri-containerd-d9b097ed59d27ead25448f023222ac459cc26d37682af62785f486008271b361.scope/cgroup.procs: no such file or directory: unknown: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "69cc58960cac2f68567eec2a84011360392a649b12c91f955d6ddfeb6e34b80a": OCI runtime exec failed: exec failed: unable to start container process: error adding pid 78936 to cgroups: failed to write 78936: openat2 /sys/fs/cgroup/unified/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-besteffort.slice/kubelet-kubepods-besteffort-pod60faed4d_72a9_453e_be90_24e96a46d7e5.slice/cri-containerd-d9b097ed59d27ead25448f023222ac459cc26d37682af62785f486008271b361.scope/cgroup.procs: no such file or directory: unknown

This is limited to cgroups v1, this happens in our CI environment which is further awful by way of host node containerd => dockerd (in CI cluster pod) => kind (running against that nested dockerd)

BenTheElder · 2023-05-15T17:58:27Z

xref: kubernetes/k8s.io#5276 (we'll need to make sure we continue to cover v1 elsewhere for some time)

BenTheElder · 2023-05-15T23:17:04Z

I ran an image built for k8s 1.27.1 with #3221 but on a GKE 1.26 / cgroupv2 node pool and no issues so far. While checking into kubernetes/k8s.io#5276

Eventually a lot of these headaches will go away with v1 goes away, but not just yet, probably another 1-2 years.

BenTheElder · 2023-05-15T23:25:46Z

In the CI nested environment we have:

runc version 1.1.5
commit: v1.1.5-0-gf19387a
spec: 1.0.2-dev
go: go1.19.7
libseccomp: 2.5.1

with docker 23.0.4

However there is also the host CI node level containerd/runc. I don't think I have direct access to the k8s infra CI nodes so it's not quite as easy to confirm the versions there.

BenTheElder · 2023-05-19T22:42:08Z

With #3221 rebased atop #3241 / #3240 runc 1.1.6 is working in Prow CI.

That leaves k8s < 1.24 to consider. Looking into options.

BenTheElder · 2023-05-23T05:52:55Z

I think we can just push older k8s versions with a base image from to make -C images/base push EXTRA_BUILD_OPT=--build-arg=RUNC_VERSION=v1.1.5 TAG_SUFFIX=_runc-v1.1.5 until we phase them out and add a release note about this.

BenTheElder · 2023-05-23T18:44:33Z

There's still some less frequent issues with misc controller in kubernetes CI. ref: #3250 (comment)

The host runtime is not aware of misc, and probably won't be for a while.

BenTheElder · 2023-05-24T19:48:54Z

The trick from @kolyshkin to unmount the misc controller doesn't appear to work even if we add some logic to consider misc unsupported when on cgroupv1 + kubernetes without the kubelet runc update.

Tentatively systemd discovers that misc is available and enabled on the host kernel via /proc/cgroups and mounts it back after we've removed it.

We have a bug currently where we'd mount it back as well, but even after fixing this and confirming it's not mounted before exec to systemd I see it is mounted later after the node container comes up.

After inspecting systemd's logic for this, considering bind mounting a modified /proc/cgroups to pretend misc isn't available when on cgroups v1 hosts 🙈 We could unmount that shortly after systemd comes up / bootstraps cgroups.

Someday we will only need to support hosts with cgroups v2 and we can phase out most of the nonsense kind employs currently. At least we're always using cgroupns starting with the next release (#3241).

BenTheElder · 2023-05-24T19:52:38Z

We also should get around to fixing the horrible dind setup that the main Kubernetes CI is running (which is itself kubernetes pods), but similarly considering if we can get that switched to cgroups v2 first (kubernetes/k8s.io#5276) and just test v1 in github actions without dind for the remaining users that haven't switched yet.

medyagh · 2023-05-25T17:41:39Z

it seems like minikube is already using runc 1.1.7, we have not faced any issues yet (or discovered it yet)
I am curious does the docker desktop on MacOs use cgroup v2 ?

@BenTheElder do you know a specific OS we could try on to see if it would fail for minikube ? the oldest ubuntu on free github action is ubuntu 20.04 (and minikube github action tests run on that) that seems to be cgroup v1

BenTheElder · 2023-05-25T17:47:01Z

I am curious does the docker desktop on MacOs use cgroup v2 ?

Yes.

do you know a specific OS we could try on to see if it would fail for minikube ? the oldest ubuntu on free github action is ubuntu 20.04

You won't see cluster bring up fail at least with kind, but once pods have been running for a while things will start to fail (e.g. when running e2e tests container execs will break).

I'm currently developing with a GCE VM on ubuntu 23.04 but doing:

# manually add to linux cmd: `systemd.unified_cgroup_hierarchy=0`
sudo nano /etc/default/grub
# reboot with this config
sudo update-grub
sudo reboot

This ensures a new enough kernel to have misc controller yet putting the distro back on v1.

I was planning to follow up with minikube when we had a solution, there's been some other recent patches for always using cgroupns=private but they're not quite fully baked yet.

For now I'd recommend moving back to 1.1.5, the bug fixes since 1.1.5 are mostly pretty minor currently.

BenTheElder · 2023-05-25T17:49:50Z

If you use Kubernetes without the recent patches to update to runc 1.1.6, (only available in 1.24+ on latest patch versions) the problems are worse. opencontainers/runc#3849

BenTheElder · 2023-05-25T18:50:08Z

Looks like the bind mount hack will do it as expected, need to do some more testing and cleanup.

BenTheElder · 2023-05-25T20:33:39Z

#3255 should resolve this.

The PR body outlines the core necessary parts, the change itself is a bit messy so I've outlined the key approach in the PR body / comments.

medyagh · 2023-05-25T21:12:50Z

an update on minikube side: we could reproduce this bug for minikube, even though we have latest runc version,
we tried it (ubuntu 20.04) though:

# manually add to linux cmd: `systemd.unified_cgroup_hierarchy=0`
sudo nano /etc/default/grub
# reboot with this config
sudo update-grub
sudo reboot

however it worth noting after doing the ^^ the mount grep was still showing cgroupv2. so maybe we failed to make it cgroup v1.

BenTheElder · 2023-05-25T21:30:50Z

To reproduce you also need a new enough kernel to have the misc controller (~5.15, depending on distro patches).

I used 23.04. Also make sure to set unified underGRUB_CMDLINE_LINUX= and save.

bentheelder@cgroups:~$ cat /etc/default/grub
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"

spowelljr · 2023-05-25T21:37:40Z

I'll try with an Ubuntu 23.04 machine, previously what I tried was:

$ uname -a
Linux ubuntu-20-agent-5 5.15.0-1034-gcp #42~20.04.1-Ubuntu SMP Thu May 18 05:40:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ mount | grep cgroup2
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)

# modify grub
$ cat /etc/default/grub | grep "GRUB_CMDLINE_LINUX"
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"

$ sudo update-grub
$ sudo reboot

# cgroup2 still showing?
$ mount | grep cgroup2
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)

$ minikube start --kubernetes-version=v1.23.0

# then tried repro based on https://github.com/opencontainers/runc/issues/3849

BenTheElder · 2023-05-25T21:59:56Z

# cgroup2 still showing?
$ mount | grep cgroup2
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)

If cgroup2 is on /sys/fs/cgroup/unified this is not cgroupv2 mode this is cgroup v1 + v2.

Docker etc should still use v1 then, systemd calls this "hybrid" mode (https://systemd.io/CGROUP_DELEGATION/)..

That's expected. You don't need pure v1 mode. You do need misc enabled though.

BenTheElder · 2023-05-25T22:02:26Z

FWIW: I'm not currently reproducing the issue, what I'm looking for is misc in-use, since I already settled on just disabling misc in v1 (see discussion in #3255). But when I was, the reproducer in the runc issue was sufficient.

BenTheElder · 2023-05-25T22:39:00Z

I would also recommend considering #3241 while working on the cgroups support.

It has the downside of raising the minimum docker version to 20.10.0 (2.5 years old), but makes the whole containers-in-containers thing a log cleaner. We get this by default from all major runtimes with the transition to cgroups v2 but as long as users are on v1, v1 with cgroupns on is a lot better. For kind at least that required some additional fixups, for minikube it might be as simple as added --cgroupns=private option to the node containers.

BenTheElder · 2023-05-26T00:35:39Z

kind is on runc 1.1.7 now

alexeadem · 2023-06-07T20:10:38Z

I see all the changes to fix this happened at the kind node base image. I'm using image
hub.docker.com/kindest/node:v1.27.2hub.docker.comkindest/nodev1.27.2 and I'm unable to kubectl exec into the pods
with the following error

kubectl exec -it kindnet-9x65w -n kube-system -- bash
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "802ffd1ad2b8344badec3079b44b13619109e7b59e1576dd3816d0c2752661af": OCI runtime exec failed: exec failed: unable to start container process: error adding pid 846 to cgroups: failed to write 846: openat2 /sys/fs/cgroup/unified/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-podac27167f_93ea_4991_9fb0_7bd3a9b6c463.slice/cri-containerd-453a4ebc9ef15c15a76e4cea9646bb9638a44ce32c51f300024718822cb05730.scope/cgroup.procs: no such file or directory: unknown

My guess is that newer kind k8s nodes images haven't been updated/rebuild with the new base image?

Is there a workaround at the 0S level I can use to make newer k8s versions to run without this issue in cgroup v1?

I'm using

sudo grubby --update-kernel=`sudo grubby --default-kernel` --args="systemd.unified_cgroup_hierarchy=0"

mount | grep cgroup2
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)

therefore hybrid mode

OS version

NAME="Fedora Linux"
VERSION="38 (Thirty Eight)"
ID=fedora
VERSION_ID=38
VERSION_CODENAME=""
PLATFORM_ID="platform:f38"
PRETTY_NAME="Fedora Linux 38 (Thirty Eight)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:38"
DEFAULT_HOSTNAME="fedora"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f38/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=38
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=38
SUPPORT_END=2024-05-14

6.2.14-300.fc38.x86_64

BenTheElder · 2023-06-07T20:12:28Z

Thanks for the report.

To avoid image changes like this please use the digests as instructed in the release notes (IE @sha256...)

BenTheElder · 2023-06-07T20:39:02Z

To fully debug your environment we'l need a full bug template report with the information requested there.

I suspect this is due to cgroupv1 being used for the cluster nodes without cgroupns=private. In kind v0.20.0 we will force it to always cgroupns=private, but the images were expected to continue to work with cgroupns=host.

My guess is that newer kind k8s nodes images haven't been updated/rebuild with the new base image?

1.27.2 has been, so probably the opposite issue?

In the short term if you use the digest pinning you will be able to use a version predating these base image changes.

Or, you could try the latest kind code at HEAD and see if the cgroupns=private change solves it.
If you're using docker there's a dockerd flag to change the default, or switching to cgroupsv2 (docker and podman switch to cgroupns=private when using cgroupsv2 unified).

alexeadem · 2023-06-07T22:38:04Z

I'm using qbo which uses kind node images. I found the the docker API equivalent

"CgroupnsMode": "private",\

https://docs.docker.com/engine/api/v1.43/#tag/Container/operation/ContainerCreate

Thanks for the quick response and the help. It is all working fine now and I can do kubectl exec without issues.

BenTheElder added the kind/bug Categorizes issue or PR as related to a bug. label May 12, 2023

BenTheElder self-assigned this May 15, 2023

BenTheElder mentioned this issue May 19, 2023

kubelet base environment fixes #3240

Merged

BenTheElder added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label May 19, 2023

BenTheElder mentioned this issue May 19, 2023

enforce private cgroupns #3241

Merged

This was referenced May 21, 2023

Support GPUs #3164

Open

[WIP] misc controller workaround #3246

Closed

This was referenced May 23, 2023

upgrade runc to 1.1.7 and containerd to 1.7.1 #3249

Merged

bump images with containerd 1.7.1 and runc 1.1.7 #3250

Closed

This was referenced May 25, 2023

upgrade base/node image #3256

Merged

Update runc to v1.1.7 kubernetes/kops#15375

Merged

k8s-ci-robot closed this as completed in #3256 May 26, 2023

BenTheElder added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label May 26, 2023

BenTheElder mentioned this issue May 26, 2023

[runc][1.1.6] Adding misc controller to cgroup v1 makes kubelet sad kubernetes/kubernetes#117647

Closed

chrischdi mentioned this issue May 30, 2023

Clusters with v1.27.2 seem to have issues kubernetes-sigs/cluster-api#8764

Closed

aramase mentioned this issue May 31, 2023

chore: helm chart release for driver v1.3.3 Azure/secrets-store-csi-driver-provider-azure#1188

Merged

BenTheElder mentioned this issue Jun 1, 2023

Allow configurable docker network for kind cluster nodes #273

Closed

BenTheElder mentioned this issue Aug 24, 2023

Rancher-Desktop [Alpine] can't create cluster with v0.20.0 [Previously Also Colima] #3277

Closed

liangyuanpeng mentioned this issue Jan 17, 2024

Fix github action e2e test. kubernetes-sigs/apiserver-network-proxy#555

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upgrading to runc 1.1.6 / 1.1.7 breaks #3223

upgrading to runc 1.1.6 / 1.1.7 breaks #3223

BenTheElder commented May 12, 2023

BenTheElder commented May 15, 2023

BenTheElder commented May 15, 2023 •

edited

Loading

BenTheElder commented May 15, 2023

BenTheElder commented May 19, 2023

BenTheElder commented May 23, 2023

BenTheElder commented May 23, 2023 •

edited

Loading

BenTheElder commented May 24, 2023

BenTheElder commented May 24, 2023

medyagh commented May 25, 2023 •

edited

Loading

BenTheElder commented May 25, 2023

BenTheElder commented May 25, 2023 •

edited

Loading

BenTheElder commented May 25, 2023

BenTheElder commented May 25, 2023

medyagh commented May 25, 2023

BenTheElder commented May 25, 2023

spowelljr commented May 25, 2023

BenTheElder commented May 25, 2023 •

edited

Loading

BenTheElder commented May 25, 2023

BenTheElder commented May 25, 2023

BenTheElder commented May 26, 2023

alexeadem commented Jun 7, 2023 •

edited

Loading

BenTheElder commented Jun 7, 2023

BenTheElder commented Jun 7, 2023

alexeadem commented Jun 7, 2023 •

edited

Loading

upgrading to runc 1.1.6 / 1.1.7 breaks #3223

upgrading to runc 1.1.6 / 1.1.7 breaks #3223

Comments

BenTheElder commented May 12, 2023

BenTheElder commented May 15, 2023

BenTheElder commented May 15, 2023 • edited Loading

BenTheElder commented May 15, 2023

BenTheElder commented May 19, 2023

BenTheElder commented May 23, 2023

BenTheElder commented May 23, 2023 • edited Loading

BenTheElder commented May 24, 2023

BenTheElder commented May 24, 2023

medyagh commented May 25, 2023 • edited Loading

BenTheElder commented May 25, 2023

BenTheElder commented May 25, 2023 • edited Loading

BenTheElder commented May 25, 2023

BenTheElder commented May 25, 2023

medyagh commented May 25, 2023

BenTheElder commented May 25, 2023

spowelljr commented May 25, 2023

BenTheElder commented May 25, 2023 • edited Loading

BenTheElder commented May 25, 2023

BenTheElder commented May 25, 2023

BenTheElder commented May 26, 2023

alexeadem commented Jun 7, 2023 • edited Loading

BenTheElder commented Jun 7, 2023

BenTheElder commented Jun 7, 2023

alexeadem commented Jun 7, 2023 • edited Loading

BenTheElder commented May 15, 2023 •

edited

Loading

BenTheElder commented May 23, 2023 •

edited

Loading

medyagh commented May 25, 2023 •

edited

Loading

BenTheElder commented May 25, 2023 •

edited

Loading

BenTheElder commented May 25, 2023 •

edited

Loading

alexeadem commented Jun 7, 2023 •

edited

Loading

alexeadem commented Jun 7, 2023 •

edited

Loading