Skip to content

Conversation

HirazawaUi
Copy link
Contributor

  • One-line PR description: Allow hostNetwork pods to use user namespaces
  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 3, 2025
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 3, 2025
@HirazawaUi HirazawaUi force-pushed the kep-5607 branch 2 times, most recently from 6252f70 to f4441d3 Compare October 3, 2025 16:41
When the `UserNamespacesHostNetworkSupport` feature gate is enabled, we will relax this validation check.
The kube-apiserver will accept such a Pod spec and pass it on to the kubelet.
At this point, the responsibility for successfully creating and running the Pod shifts to the container runtime.
If the low-level container runtime (e.g., containerd/runc) does not support this combination, the pod will remain stuck in the `ContainerCreating` state and report an exception event, which is the expected behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with this proposal, we should include making it work with containers/crio/runc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containerd needs changes for this, I think runc too. I'm unsure about crio and crun. @giuseppe ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crun supports it. I am not sure about CRI-O but I don't see any explicit check to prevent that combination

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been added as a graduation requirement for the beta phase.

### User Stories (Optional)

#### Story 1
As a cluster administrator, I want to enable user namespaces for my control plane static Pods (e.g., kube-apiserver, kube-controller-manager) to follow the principle of least privilege and reduce the attack surface. These Pods need to use hostNetwork to interact correctly with the cluster network. By enabling the new feature gate, I can add a critical layer of security isolation to these vital components without changing their networking model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be clear about what is possible with such a combination. For e.g. this may work for just listening on host network but will probably fail even if the pod has admin privileges and tries to make changes that are prevented by the user namespace.

cc: @rata @giuseppe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, this will need quite some documentation. To make sure users understand you probably can't bind on privileged ports even if you have cap whatever or maybe even the sysctl to change the privileged port range is ineffective too.

But LGTM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that capabilities such as CAP_NET_RAW, CAP_NET_ADMIN, and CAP_NET_BIND_SERVICE remain restricted. I have also added this as a graduation requirement for the beta phase in the KEP.

Please correct me if I'm wrong, as I am not deeply familiar with this area.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. I'd say let's document this in alpha, but I don't oppose as doing it for beta. I don't see why to postpone it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the alpha stage, this feature is not accessible to users, as we are still awaiting runtime support across the board. Additionally, we need to wait until runtime support is in place to finalize the scope of this feature :)

@HirazawaUi HirazawaUi force-pushed the kep-5607 branch 4 times, most recently from 6fcfeb6 to 19be9bb Compare October 8, 2025 09:42
@HirazawaUi HirazawaUi force-pushed the kep-5607 branch 2 times, most recently from e1b7b09 to 3597a1c Compare October 13, 2025 14:25
@HirazawaUi HirazawaUi force-pushed the kep-5607 branch 3 times, most recently from f71fe90 to ced6f5b Compare October 15, 2025 00:13
@wojtek-t
Copy link
Member

This looks fine for Alpha from PRR perspective. Thanks!

/approve PRR

#### GA

- The feature has been stable in Beta for at least 2 Kubernetes releases.
- Multiple major container runtimes support the feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last question, in GA we completely disable the feature gate , so if a runtime does not support it the pod will be fail, is that ok?

Copy link
Contributor Author

@HirazawaUi HirazawaUi Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can optionally enable this feature. Taking the UserNamespacesSupport feature it relies on as an example, if the container runtime does not support this UserNamespacesSupport, the pod will similarly remain stuck in the ContainerCreating state. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, meanwhile we are fine with this, I just want to avoid we end using the feature gate as a feature flag , similar on what happened with the swap feature

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I am fine with this. We may wait to GA until e.g. all containerd versions k8s chooses to support support this feature (similar to cgroup driver from CRI, or CRI stats)

@HirazawaUi
Copy link
Contributor Author

/assign @mrunalp for approval

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

/assign @mrunalp

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 16, 2025
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 16, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: HirazawaUi, mrunalp, SergeyKanzhelev, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 16, 2025
@SergeyKanzhelev
Copy link
Member

/lgtm

We have a beta graduation criteria to list all limitations. We will need to decide at that stage if there some additional things will be needed to help customers to not break things

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 16, 2025

### Risks and Mitigations

If either the container runtime or the underlying container runtime does not support this feature, the container will fail to be created. To mitigate this issue, we will keep this feature in the alpha stage until mainstream container runtimes (containerd/runc) and mainstream underlying container runtimes (runc/crun) both support it, before promoting it to beta.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to use CRI implementations (containerd/crio) to resolve ambiguity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that was a typo. I'll fix it in the next PR.

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experimenting in alpha sounds like a good approach here. We need to clearly articulate limitations and supported scenarios. Hopefully we will be able to avoid confusion with customer, maybe we will need some new API or API validaiton.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add tests based on "error" scenarios during the alpha phase, but there's one issue with alpha-stage testing: if the container runtime supports this behavior in a future version, those tests designed for "error" scenarios would fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Containerd make sure to follow the process https://containerd.io/keps/ and coordinate the right approach here with the runtime maintainers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the reminder! :)

@k8s-ci-robot k8s-ci-robot merged commit 5002b73 into kubernetes:master Oct 16, 2025
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.35 milestone Oct 16, 2025
## Design Details

The core design change is very simple: in the apiserver's Pod validation logic, locate the code block that prevents the `hostNetwork: true` and `hostUsers: false` combination, and wrap it in a conditional that only executes the validation if the `UserNamespacesHostNetworkSupport` feature gate is disabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm this doesn't actually include the changes to crio/containerd/runc/crun . it's confusing because some parts of the KEP mark it as a beta requirement, but this section doesn't touch it at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I haven't implemented it in the container runtime and the underlying container runtime yet, I haven't included the implementation details for now. If possible, I think we can add it during the beta phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants