-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-5607: Allow hostNetwork pods to use user namespaces #5608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
HirazawaUi
commented
Oct 3, 2025
- One-line PR description: Allow hostNetwork pods to use user namespaces
- Issue link: Allow HostNetwork Pods to Use User Namespaces #5607
- Other comments:
6252f70
to
f4441d3
Compare
When the `UserNamespacesHostNetworkSupport` feature gate is enabled, we will relax this validation check. | ||
The kube-apiserver will accept such a Pod spec and pass it on to the kubelet. | ||
At this point, the responsibility for successfully creating and running the Pod shifts to the container runtime. | ||
If the low-level container runtime (e.g., containerd/runc) does not support this combination, the pod will remain stuck in the `ContainerCreating` state and report an exception event, which is the expected behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go with this proposal, we should include making it work with containers/crio/runc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
containerd needs changes for this, I think runc too. I'm unsure about crio and crun. @giuseppe ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crun supports it. I am not sure about CRI-O but I don't see any explicit check to prevent that combination
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been added as a graduation requirement for the beta phase.
### User Stories (Optional) | ||
|
||
#### Story 1 | ||
As a cluster administrator, I want to enable user namespaces for my control plane static Pods (e.g., kube-apiserver, kube-controller-manager) to follow the principle of least privilege and reduce the attack surface. These Pods need to use hostNetwork to interact correctly with the cluster network. By enabling the new feature gate, I can add a critical layer of security isolation to these vital components without changing their networking model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, this will need quite some documentation. To make sure users understand you probably can't bind on privileged ports even if you have cap whatever or maybe even the sysctl to change the privileged port range is ineffective too.
But LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that capabilities such as CAP_NET_RAW
, CAP_NET_ADMIN
, and CAP_NET_BIND_SERVICE
remain restricted. I have also added this as a graduation requirement for the beta phase in the KEP.
Please correct me if I'm wrong, as I am not deeply familiar with this area.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. I'd say let's document this in alpha, but I don't oppose as doing it for beta. I don't see why to postpone it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the alpha stage, this feature is not accessible to users, as we are still awaiting runtime support across the board. Additionally, we need to wait until runtime support is in place to finalize the scope of this feature :)
6fcfeb6
to
19be9bb
Compare
e1b7b09
to
3597a1c
Compare
f71fe90
to
ced6f5b
Compare
This looks fine for Alpha from PRR perspective. Thanks! /approve PRR |
#### GA | ||
|
||
- The feature has been stable in Beta for at least 2 Kubernetes releases. | ||
- Multiple major container runtimes support the feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last question, in GA we completely disable the feature gate , so if a runtime does not support it the pod will be fail, is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can optionally enable this feature. Taking the UserNamespacesSupport
feature it relies on as an example, if the container runtime does not support this UserNamespacesSupport
, the pod will similarly remain stuck in the ContainerCreating
state. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, meanwhile we are fine with this, I just want to avoid we end using the feature gate as a feature flag , similar on what happened with the swap feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I am fine with this. We may wait to GA until e.g. all containerd versions k8s chooses to support support this feature (similar to cgroup driver from CRI, or CRI stats)
/assign @mrunalp for approval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/assign @mrunalp
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: HirazawaUi, mrunalp, SergeyKanzhelev, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm We have a beta graduation criteria to list all limitations. We will need to decide at that stage if there some additional things will be needed to help customers to not break things |
|
||
### Risks and Mitigations | ||
|
||
If either the container runtime or the underlying container runtime does not support this feature, the container will fail to be created. To mitigate this issue, we will keep this feature in the alpha stage until mainstream container runtimes (containerd/runc) and mainstream underlying container runtimes (runc/crun) both support it, before promoting it to beta. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like to use CRI implementations (containerd/crio)
to resolve ambiguity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, that was a typo. I'll fix it in the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experimenting in alpha sounds like a good approach here. We need to clearly articulate limitations and supported scenarios. Hopefully we will be able to avoid confusion with customer, maybe we will need some new API or API validaiton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add tests based on "error" scenarios during the alpha phase, but there's one issue with alpha-stage testing: if the container runtime supports this behavior in a future version, those tests designed for "error" scenarios would fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Containerd make sure to follow the process https://containerd.io/keps/ and coordinate the right approach here with the runtime maintainers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for the reminder! :)
## Design Details | ||
|
||
The core design change is very simple: in the apiserver's Pod validation logic, locate the code block that prevents the `hostNetwork: true` and `hostUsers: false` combination, and wrap it in a conditional that only executes the validation if the `UserNamespacesHostNetworkSupport` feature gate is disabled. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm this doesn't actually include the changes to crio/containerd/runc/crun . it's confusing because some parts of the KEP mark it as a beta requirement, but this section doesn't touch it at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I haven't implemented it in the container runtime and the underlying container runtime yet, I haven't included the implementation details for now. If possible, I think we can add it during the beta phase.