Summary
When --systemd-cgroup is set in runsc.toml and CRI hands runsc a cgroupfs-form Linux.CgroupsPath (e.g. /kubepods/burstable/pod<uid>/<id>), runsc fails with:
cannot set up cgroup for root: invalid systemd path: "/kubepods/burstable/pod<uid>"
The error sounds like a parser bug or malformed input. It actually means the host cgroup hierarchy is misconfigured — kubelet is writing pod cgroups to /sys/fs/cgroup/kubepods/ (cgroupfs tree) instead of kubepods.slice/... (systemd tree), even though kubelet itself has cgroupDriver: systemd.
Two small asks here, both narrowly scoped:
- Improve the error message so an operator immediately knows where to look.
- Add a one-paragraph note + diagnostic command to the Systemd cgroup driver page.
What I originally claimed (and was wrong about)
The first version of this issue argued that --systemd-cgroup "silently bypasses Kubernetes CPU limits" and that #12392's guidance was generally wrong. After testing on a second cluster with identical kubelet/containerd/OS config:
| Cluster |
/sys/fs/cgroup/kubepods exists? |
OCI cgroupsPath |
--systemd-cgroup=true outcome |
| Cluster A (works) |
no, only kubepods.slice/ |
kubepods-burstable-pod<uid_>.slice:cri-containerd:<id> |
runsc joins kubepods-burstable-pod<uid_>.slice/cri-containerd-<id>.scope, cpu.max correct, limits enforced |
| Cluster B (broken) |
yes, both kubepods/ and kubepods.slice/ |
/kubepods/burstable/pod<uid>/<id> |
runsc fails with invalid systemd path |
Same kubelet 1.35.2, same containerd 2.2.2, same Ubuntu 22.04.5, same kernel 6.8.0-1055-aws, same kubelet.yaml (cgroupDriver: systemd, kubeReservedCgroup: /runtime, systemReservedCgroup: /system).
The "silent bypass" branch I described required a hand-rolled shim wrapper that translated path-form to slice-form before runsc ran. That's not a stock setup. Walking that part of the report back. --systemd-cgroup does the right thing on a correctly-built host; on a misconfigured host it fails loudly. Both behaviors are reasonable.
What's left is a minor UX gap: the loud failure points an operator at the wrong layer.
Why the misleading error costs operator time
invalid systemd path: "/kubepods/burstable/pod<uid>" reads like one of:
- runsc parser bug
- containerd handed runsc a malformed string
- gVisor doesn't support this kubelet/containerd combo
None of those are true. The actual cause is upstream of runsc: something on the host pre-created /sys/fs/cgroup/kubepods/ before kubelet started, kubelet wrote pod cgroups into that pre-existing tree, and CRI faithfully passes the cgroupfs parent down. The fix is on the operator side (find what creates the cgroupfs dir and stop it). Without a hint in the error, the natural debugging path is "is gVisor broken? is containerd broken?" — neither.
runsc/cgroup/cgroup.go::TransformSystemdPath already knows it expects slice:prefix:name. It can recognize a cgroupfs path when it sees one.
Proposed change to the error
// runsc/cgroup/cgroup.go
func TransformSystemdPath(path, cid string, rootless bool) (string, error) {
if len(path) == 0 {
path = fmt.Sprintf(":runsc:%s", cid)
}
parts := strings.SplitN(path, ":", 4)
if len(parts) != 3 {
if strings.HasPrefix(path, "/kubepods") {
return "", fmt.Errorf(
"Linux.CgroupsPath %q is in cgroupfs path form, but --systemd-cgroup expects %q. "+
"This usually means kubelet is configured with cgroupDriver=systemd but the host "+
"cgroup hierarchy is cgroupfs-rooted (e.g. /sys/fs/cgroup/kubepods/ exists alongside "+
"kubepods.slice/). Either disable --systemd-cgroup in runsc.toml or fix the host so "+
"kubelet builds the kubepods.slice tree.",
path, "[slice]:[prefix]:[name]")
}
return "", fmt.Errorf("invalid systemd path: %q", path)
}
...
}
Proposed doc addition (Systemd cgroup driver page)
Verifying host cgroup hierarchy. --systemd-cgroup requires that kubelet (and any other process managing the cgroup tree) places pod cgroups under systemd slices, not under a cgroupfs path. After creating a pod, run on the node:
PID=$(sudo crictl inspectp <sandbox-id> | jq -r .info.pid)
awk -F'::' '{print $2}' /proc/$PID/cgroup
# Expected (systemd): /kubepods.slice/kubepods-<qos>.slice/kubepods-<qos>-pod<uid_>.slice/cri-containerd-<id>.scope
# Misconfigured (cgroupfs): /kubepods/<qos>/pod<uid>/<id>
If the pod cgroup is under /kubepods/... (no .slice), kubelet did not build the systemd hierarchy, and --systemd-cgroup will fail with invalid systemd path: "/kubepods/...". This is upstream of runsc — kubelet detected a pre-existing /sys/fs/cgroup/kubepods directory at startup and used it instead of building kubepods.slice. Investigate what creates that directory before kubelet starts.
Environment
- runsc:
release-20260520.0
- containerd:
2.2.2
- kubelet:
1.35.2, cgroupDriver: systemd
- OS: Ubuntu 22.04, kernel 6.8, cgroup v2 unified
Related
I'm happy to PR the error message change. Doc PR is also straightforward if there's appetite.
Summary
When
--systemd-cgroupis set inrunsc.tomland CRI hands runsc a cgroupfs-formLinux.CgroupsPath(e.g./kubepods/burstable/pod<uid>/<id>), runsc fails with:The error sounds like a parser bug or malformed input. It actually means the host cgroup hierarchy is misconfigured — kubelet is writing pod cgroups to
/sys/fs/cgroup/kubepods/(cgroupfs tree) instead ofkubepods.slice/...(systemd tree), even though kubelet itself hascgroupDriver: systemd.Two small asks here, both narrowly scoped:
What I originally claimed (and was wrong about)
The first version of this issue argued that
--systemd-cgroup"silently bypasses Kubernetes CPU limits" and that#12392's guidance was generally wrong. After testing on a second cluster with identical kubelet/containerd/OS config:/sys/fs/cgroup/kubepodsexists?cgroupsPath--systemd-cgroup=trueoutcomekubepods.slice/kubepods-burstable-pod<uid_>.slice:cri-containerd:<id>kubepods-burstable-pod<uid_>.slice/cri-containerd-<id>.scope,cpu.maxcorrect, limits enforcedkubepods/andkubepods.slice//kubepods/burstable/pod<uid>/<id>invalid systemd pathSame kubelet 1.35.2, same containerd 2.2.2, same Ubuntu 22.04.5, same kernel
6.8.0-1055-aws, samekubelet.yaml(cgroupDriver: systemd,kubeReservedCgroup: /runtime,systemReservedCgroup: /system).The "silent bypass" branch I described required a hand-rolled shim wrapper that translated path-form to slice-form before runsc ran. That's not a stock setup. Walking that part of the report back.
--systemd-cgroupdoes the right thing on a correctly-built host; on a misconfigured host it fails loudly. Both behaviors are reasonable.What's left is a minor UX gap: the loud failure points an operator at the wrong layer.
Why the misleading error costs operator time
invalid systemd path: "/kubepods/burstable/pod<uid>"reads like one of:None of those are true. The actual cause is upstream of runsc: something on the host pre-created
/sys/fs/cgroup/kubepods/before kubelet started, kubelet wrote pod cgroups into that pre-existing tree, and CRI faithfully passes the cgroupfs parent down. The fix is on the operator side (find what creates the cgroupfs dir and stop it). Without a hint in the error, the natural debugging path is "is gVisor broken? is containerd broken?" — neither.runsc/cgroup/cgroup.go::TransformSystemdPathalready knows it expectsslice:prefix:name. It can recognize a cgroupfs path when it sees one.Proposed change to the error
Proposed doc addition (Systemd cgroup driver page)
Environment
release-20260520.02.2.21.35.2,cgroupDriver: systemdRelated
invalid systemd pathsymptom, different host-side causegetCgroupsPath: only emits slice form when the cgroup parent already ends in.sliceI'm happy to PR the error message change. Doc PR is also straightforward if there's appetite.