Skip to content

runsc: 'invalid systemd path' error is misleading when host cgroup hierarchy is cgroupfs-rooted #13258

@a7i

Description

@a7i

Summary

When --systemd-cgroup is set in runsc.toml and CRI hands runsc a cgroupfs-form Linux.CgroupsPath (e.g. /kubepods/burstable/pod<uid>/<id>), runsc fails with:

cannot set up cgroup for root: invalid systemd path: "/kubepods/burstable/pod<uid>"

The error sounds like a parser bug or malformed input. It actually means the host cgroup hierarchy is misconfigured — kubelet is writing pod cgroups to /sys/fs/cgroup/kubepods/ (cgroupfs tree) instead of kubepods.slice/... (systemd tree), even though kubelet itself has cgroupDriver: systemd.

Two small asks here, both narrowly scoped:

  1. Improve the error message so an operator immediately knows where to look.
  2. Add a one-paragraph note + diagnostic command to the Systemd cgroup driver page.

What I originally claimed (and was wrong about)

The first version of this issue argued that --systemd-cgroup "silently bypasses Kubernetes CPU limits" and that #12392's guidance was generally wrong. After testing on a second cluster with identical kubelet/containerd/OS config:

Cluster /sys/fs/cgroup/kubepods exists? OCI cgroupsPath --systemd-cgroup=true outcome
Cluster A (works) no, only kubepods.slice/ kubepods-burstable-pod<uid_>.slice:cri-containerd:<id> runsc joins kubepods-burstable-pod<uid_>.slice/cri-containerd-<id>.scope, cpu.max correct, limits enforced
Cluster B (broken) yes, both kubepods/ and kubepods.slice/ /kubepods/burstable/pod<uid>/<id> runsc fails with invalid systemd path

Same kubelet 1.35.2, same containerd 2.2.2, same Ubuntu 22.04.5, same kernel 6.8.0-1055-aws, same kubelet.yaml (cgroupDriver: systemd, kubeReservedCgroup: /runtime, systemReservedCgroup: /system).

The "silent bypass" branch I described required a hand-rolled shim wrapper that translated path-form to slice-form before runsc ran. That's not a stock setup. Walking that part of the report back. --systemd-cgroup does the right thing on a correctly-built host; on a misconfigured host it fails loudly. Both behaviors are reasonable.

What's left is a minor UX gap: the loud failure points an operator at the wrong layer.

Why the misleading error costs operator time

invalid systemd path: "/kubepods/burstable/pod<uid>" reads like one of:

  • runsc parser bug
  • containerd handed runsc a malformed string
  • gVisor doesn't support this kubelet/containerd combo

None of those are true. The actual cause is upstream of runsc: something on the host pre-created /sys/fs/cgroup/kubepods/ before kubelet started, kubelet wrote pod cgroups into that pre-existing tree, and CRI faithfully passes the cgroupfs parent down. The fix is on the operator side (find what creates the cgroupfs dir and stop it). Without a hint in the error, the natural debugging path is "is gVisor broken? is containerd broken?" — neither.

runsc/cgroup/cgroup.go::TransformSystemdPath already knows it expects slice:prefix:name. It can recognize a cgroupfs path when it sees one.

Proposed change to the error

// runsc/cgroup/cgroup.go
func TransformSystemdPath(path, cid string, rootless bool) (string, error) {
    if len(path) == 0 {
        path = fmt.Sprintf(":runsc:%s", cid)
    }
    parts := strings.SplitN(path, ":", 4)
    if len(parts) != 3 {
        if strings.HasPrefix(path, "/kubepods") {
            return "", fmt.Errorf(
                "Linux.CgroupsPath %q is in cgroupfs path form, but --systemd-cgroup expects %q. "+
                "This usually means kubelet is configured with cgroupDriver=systemd but the host "+
                "cgroup hierarchy is cgroupfs-rooted (e.g. /sys/fs/cgroup/kubepods/ exists alongside "+
                "kubepods.slice/). Either disable --systemd-cgroup in runsc.toml or fix the host so "+
                "kubelet builds the kubepods.slice tree.",
                path, "[slice]:[prefix]:[name]")
        }
        return "", fmt.Errorf("invalid systemd path: %q", path)
    }
    ...
}

Proposed doc addition (Systemd cgroup driver page)

Verifying host cgroup hierarchy. --systemd-cgroup requires that kubelet (and any other process managing the cgroup tree) places pod cgroups under systemd slices, not under a cgroupfs path. After creating a pod, run on the node:

PID=$(sudo crictl inspectp <sandbox-id> | jq -r .info.pid)
awk -F'::' '{print $2}' /proc/$PID/cgroup
# Expected (systemd):  /kubepods.slice/kubepods-<qos>.slice/kubepods-<qos>-pod<uid_>.slice/cri-containerd-<id>.scope
# Misconfigured (cgroupfs):  /kubepods/<qos>/pod<uid>/<id>

If the pod cgroup is under /kubepods/... (no .slice), kubelet did not build the systemd hierarchy, and --systemd-cgroup will fail with invalid systemd path: "/kubepods/...". This is upstream of runsc — kubelet detected a pre-existing /sys/fs/cgroup/kubepods directory at startup and used it instead of building kubepods.slice. Investigate what creates that directory before kubelet starts.

Environment

  • runsc: release-20260520.0
  • containerd: 2.2.2
  • kubelet: 1.35.2, cgroupDriver: systemd
  • OS: Ubuntu 22.04, kernel 6.8, cgroup v2 unified

Related

I'm happy to PR the error message change. Doc PR is also straightforward if there's appetite.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions