runsc: 'invalid systemd path' error is misleading when host cgroup hierarchy is cgroupfs-rooted

## Summary

When `--systemd-cgroup` is set in `runsc.toml` and CRI hands runsc a cgroupfs-form `Linux.CgroupsPath` (e.g. `/kubepods/burstable/pod<uid>/<id>`), runsc fails with:

```
cannot set up cgroup for root: invalid systemd path: "/kubepods/burstable/pod<uid>"
```

The error sounds like a parser bug or malformed input. It actually means **the host cgroup hierarchy is misconfigured** — kubelet is writing pod cgroups to `/sys/fs/cgroup/kubepods/` (cgroupfs tree) instead of `kubepods.slice/...` (systemd tree), even though kubelet itself has `cgroupDriver: systemd`.

Two small asks here, both narrowly scoped:

1. Improve the error message so an operator immediately knows where to look.
2. Add a one-paragraph note + diagnostic command to the [Systemd cgroup driver](https://gvisor.dev/docs/user_guide/systemd/) page.

## What I originally claimed (and was wrong about)

The first version of this issue argued that `--systemd-cgroup` "silently bypasses Kubernetes CPU limits" and that `#12392`'s guidance was generally wrong. After testing on a second cluster with **identical** kubelet/containerd/OS config:

| Cluster | `/sys/fs/cgroup/kubepods` exists? | OCI `cgroupsPath` | `--systemd-cgroup=true` outcome |
|---|---|---|---|
| Cluster A (works) | no, only `kubepods.slice/` | `kubepods-burstable-pod<uid_>.slice:cri-containerd:<id>` | runsc joins `kubepods-burstable-pod<uid_>.slice/cri-containerd-<id>.scope`, `cpu.max` correct, limits enforced |
| Cluster B (broken) | yes, both `kubepods/` and `kubepods.slice/` | `/kubepods/burstable/pod<uid>/<id>` | runsc fails with `invalid systemd path` |

Same kubelet 1.35.2, same containerd 2.2.2, same Ubuntu 22.04.5, same kernel `6.8.0-1055-aws`, same `kubelet.yaml` (`cgroupDriver: systemd`, `kubeReservedCgroup: /runtime`, `systemReservedCgroup: /system`).

The "silent bypass" branch I described required a hand-rolled shim wrapper that translated path-form to slice-form before runsc ran. That's not a stock setup. **Walking that part of the report back.** `--systemd-cgroup` does the right thing on a correctly-built host; on a misconfigured host it fails loudly. Both behaviors are reasonable.

What's left is a minor UX gap: the loud failure points an operator at the wrong layer.

## Why the misleading error costs operator time

`invalid systemd path: "/kubepods/burstable/pod<uid>"` reads like one of:

- runsc parser bug
- containerd handed runsc a malformed string
- gVisor doesn't support this kubelet/containerd combo

None of those are true. The actual cause is upstream of runsc: something on the host pre-created `/sys/fs/cgroup/kubepods/` before kubelet started, kubelet wrote pod cgroups into that pre-existing tree, and CRI faithfully passes the cgroupfs parent down. The fix is on the operator side (find what creates the cgroupfs dir and stop it). Without a hint in the error, the natural debugging path is "is gVisor broken? is containerd broken?" — neither.

`runsc/cgroup/cgroup.go::TransformSystemdPath` already knows it expects `slice:prefix:name`. It can recognize a cgroupfs path when it sees one.

## Proposed change to the error

```go
// runsc/cgroup/cgroup.go
func TransformSystemdPath(path, cid string, rootless bool) (string, error) {
    if len(path) == 0 {
        path = fmt.Sprintf(":runsc:%s", cid)
    }
    parts := strings.SplitN(path, ":", 4)
    if len(parts) != 3 {
        if strings.HasPrefix(path, "/kubepods") {
            return "", fmt.Errorf(
                "Linux.CgroupsPath %q is in cgroupfs path form, but --systemd-cgroup expects %q. "+
                "This usually means kubelet is configured with cgroupDriver=systemd but the host "+
                "cgroup hierarchy is cgroupfs-rooted (e.g. /sys/fs/cgroup/kubepods/ exists alongside "+
                "kubepods.slice/). Either disable --systemd-cgroup in runsc.toml or fix the host so "+
                "kubelet builds the kubepods.slice tree.",
                path, "[slice]:[prefix]:[name]")
        }
        return "", fmt.Errorf("invalid systemd path: %q", path)
    }
    ...
}
```

## Proposed doc addition (Systemd cgroup driver page)

> **Verifying host cgroup hierarchy.** `--systemd-cgroup` requires that kubelet (and any other process managing the cgroup tree) places pod cgroups under systemd slices, not under a cgroupfs path. After creating a pod, run on the node:
>
> ```bash
> PID=$(sudo crictl inspectp <sandbox-id> | jq -r .info.pid)
> awk -F'::' '{print $2}' /proc/$PID/cgroup
> # Expected (systemd):  /kubepods.slice/kubepods-<qos>.slice/kubepods-<qos>-pod<uid_>.slice/cri-containerd-<id>.scope
> # Misconfigured (cgroupfs):  /kubepods/<qos>/pod<uid>/<id>
> ```
>
> If the pod cgroup is under `/kubepods/...` (no `.slice`), kubelet did not build the systemd hierarchy, and `--systemd-cgroup` will fail with `invalid systemd path: "/kubepods/..."`. This is upstream of runsc — kubelet detected a pre-existing `/sys/fs/cgroup/kubepods` directory at startup and used it instead of building `kubepods.slice`. Investigate what creates that directory before kubelet starts.

## Environment

- runsc: `release-20260520.0`
- containerd: `2.2.2`
- kubelet: `1.35.2`, `cgroupDriver: systemd`
- OS: Ubuntu 22.04, kernel 6.8, cgroup v2 unified

## Related

- #12392 — closing advice still correct on hosts with proper systemd hierarchy
- #9580 — host enforces parent limits when sandbox is in the right tree
- #7671 — same `invalid systemd path` symptom, different host-side cause
- containerd `getCgroupsPath`: only emits slice form when the cgroup parent already ends in `.slice`

I'm happy to PR the error message change. Doc PR is also straightforward if there's appetite.


Cluster	`/sys/fs/cgroup/kubepods` exists?	OCI `cgroupsPath`	`--systemd-cgroup=true` outcome
Cluster A (works)	no, only `kubepods.slice/`	`kubepods-burstable-pod<uid_>.slice:cri-containerd:<id>`	runsc joins `kubepods-burstable-pod<uid_>.slice/cri-containerd-<id>.scope`, `cpu.max` correct, limits enforced
Cluster B (broken)	yes, both `kubepods/` and `kubepods.slice/`	`/kubepods/burstable/pod<uid>/<id>`	runsc fails with `invalid systemd path`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runsc: 'invalid systemd path' error is misleading when host cgroup hierarchy is cgroupfs-rooted #13258

Summary

What I originally claimed (and was wrong about)

Why the misleading error costs operator time

Proposed change to the error

Proposed doc addition (Systemd cgroup driver page)

Environment

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

runsc: 'invalid systemd path' error is misleading when host cgroup hierarchy is cgroupfs-rooted #13258

Description

Summary

What I originally claimed (and was wrong about)

Why the misleading error costs operator time

Proposed change to the error

Proposed doc addition (Systemd cgroup driver page)

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions