Skip to content

Commit c4ad6c5

Browse files
committed
Allow hostNetwork pods to use user namespaces
1 parent cafbf08 commit c4ad6c5

File tree

3 files changed

+420
-0
lines changed

3 files changed

+420
-0
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 5607
2+
alpha:
3+
approver: "@wojtek-t"
Lines changed: 380 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,380 @@
1+
# KEP-5607: Allow HostNetwork Pods to Use User Namespaces
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [User Stories (Optional)](#user-stories-optional)
11+
- [Story 1](#story-1)
12+
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
13+
- [Risks and Mitigations](#risks-and-mitigations)
14+
- [Design Details](#design-details)
15+
- [Test Plan](#test-plan)
16+
- [Prerequisite testing updates](#prerequisite-testing-updates)
17+
- [Unit tests](#unit-tests)
18+
- [Integration tests](#integration-tests)
19+
- [e2e tests](#e2e-tests)
20+
- [Graduation Criteria](#graduation-criteria)
21+
- [Alpha](#alpha)
22+
- [Beta](#beta)
23+
- [GA](#ga)
24+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
25+
- [Version Skew Strategy](#version-skew-strategy)
26+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
27+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
28+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
29+
- [Monitoring Requirements](#monitoring-requirements)
30+
- [Dependencies](#dependencies)
31+
- [Scalability](#scalability)
32+
- [Troubleshooting](#troubleshooting)
33+
- [Implementation History](#implementation-history)
34+
- [Drawbacks](#drawbacks)
35+
- [Alternatives](#alternatives)
36+
- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
37+
<!-- /toc -->
38+
39+
## Release Signoff Checklist
40+
41+
<!--
42+
**ACTION REQUIRED:** In order to merge code into a release, there must be an
43+
issue in [kubernetes/enhancements] referencing this KEP and targeting a release
44+
milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases)
45+
of the targeted release**.
46+
47+
For enhancements that make changes to code or processes/procedures in core
48+
Kubernetes—i.e., [kubernetes/kubernetes], we require the following Release
49+
Signoff checklist to be completed.
50+
51+
Check these off as they are completed for the Release Team to track. These
52+
checklist items _must_ be updated for the enhancement to be released.
53+
-->
54+
55+
Items marked with (R) are required *prior to targeting to a milestone / release*.
56+
57+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
58+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
59+
- [ ] (R) Design details are appropriately documented
60+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
61+
- [ ] e2e Tests for all Beta API Operations (endpoints)
62+
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
63+
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
64+
- [ ] (R) Graduation criteria is in place
65+
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) within one minor version of promotion to GA
66+
- [ ] (R) Production readiness review completed
67+
- [ ] (R) Production readiness review approved
68+
- [ ] "Implementation History" section is up-to-date for milestone
69+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
70+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
71+
72+
<!--
73+
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
74+
-->
75+
76+
[kubernetes.io]: https://kubernetes.io/
77+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
78+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
79+
[kubernetes/website]: https://git.k8s.io/website
80+
81+
## Summary
82+
83+
This KEP proposes introducing a new feature gate to allow Pods to have both `hostNetwork` enabled and user namespaces enabled (by setting `hostUsers: false`).
84+
85+
## Motivation
86+
87+
The primary motivation is to enhance the security of Kubernetes control plane components. Many control plane components, such as the `kube-apiserver` and `kube-controller-manager` often run as static Pods and are configured with `hostNetwork: true` to bind to node ports or interact directly with the host's network stack.
88+
89+
Currently, a validation rule in the kube-apiserver prevents the combination of `hostNetwork: true` and `hostUsers: false`. This KEP aims to remove that barrier.
90+
91+
### Goals
92+
93+
* Introduce a new, separate alpha feature gate: `UserNamespacesHostNetworkSupport`.
94+
95+
* When this feature gate is enabled, modify the Pod validation logic to allow Pod specs where `spec.hostNetwork` is true and `spec.hostUsers` is false.
96+
97+
### Non-Goals
98+
99+
Including this functionality as part of the `UserNamespacesSupport` feature gate. As `UserNamespacesSupport` is nearing GA, it would be unwise to add a new, unstable feature with external dependencies.
100+
101+
## Proposal
102+
103+
We propose the introduction of a new feature gate named `UserNamespacesHostNetworkSupport`.
104+
105+
When this feature gate is disabled (the default state), the kube-apiserver will maintain the current validation behavior, rejecting any Pod spec that includes both `spec.hostNetwork: true` and `spec.hostUsers: false`.
106+
107+
When the `UserNamespacesHostNetworkSupport` feature gate is enabled, we will relax this validation check.
108+
The kube-apiserver will accept such a Pod spec and pass it on to the kubelet.
109+
At this point, the responsibility for successfully creating and running the Pod shifts to the container runtime.
110+
If the low-level container runtime (e.g., containerd/runc) does not support this combination, the pod will remain stuck in the `ContainerCreating` state and report an exception event, which is the expected behavior.
111+
112+
This change will primarily involve modifying the Pod validation function in pkg/apis/core/validation/validation.go to account for the state of the new feature gate.
113+
114+
### User Stories (Optional)
115+
116+
#### Story 1
117+
As a cluster administrator, I want to enable user namespaces for my control plane static Pods (e.g., kube-apiserver, kube-controller-manager) to follow the principle of least privilege and reduce the attack surface. These Pods need to use hostNetwork to interact correctly with the cluster network. By enabling the new feature gate, I can add a critical layer of security isolation to these vital components without changing their networking model.
118+
119+
120+
### Notes/Constraints/Caveats (Optional)
121+
122+
### Risks and Mitigations
123+
124+
125+
## Design Details
126+
127+
The core design change is very simple: in the apiserver's Pod validation logic, locate the code block that prevents the `hostNetwork: true` and `hostUsers: false` combination, and wrap it in a conditional that only executes the validation if the `UserNamespacesHostNetworkSupport` feature gate is disabled.
128+
```
129+
func validateHostUsers(spec *core.PodSpec, fldPath *field.Path, opts PodValidationOptions) field.ErrorList {
130+
allErrs := field.ErrorList{}
131+
132+
// ... existing validations ...
133+
134+
// Note we already validated above spec.SecurityContext is not nil.
135+
if !utilfeature.DefaultFeatureGate.Enabled(features.UserNamespacesHostNetworkSupport) && spec.SecurityContext.HostNetwork {
136+
allErrs = append(allErrs, field.Forbidden(fldPath.Child("hostNetwork"), "when `hostUsers` is false"))
137+
}
138+
139+
// ... existing validations ...
140+
141+
return allErrs
142+
}
143+
144+
```
145+
146+
### Test Plan
147+
148+
[ ] I/we understand the owners of the involved components may require updates to
149+
existing tests to make this code solid enough prior to committing the changes necessary
150+
to implement this enhancement.
151+
152+
##### Prerequisite testing updates
153+
154+
##### Unit tests
155+
156+
- `pkg/apis/core/validation`: `2025-10-03` - `85.1%`
157+
158+
##### Integration tests
159+
160+
##### e2e tests
161+
162+
- Add e2e tests to ensure that pods with the combination of `hostNetwork: true` and `hostUsers: false` can run properly.
163+
164+
### Graduation Criteria
165+
166+
#### Alpha
167+
168+
- The `UserNamespacesHostNetworkSupport` feature gate is implemented and disabled by default.
169+
170+
#### Beta
171+
172+
- Mainstream container runtimes and low-level container runtimes (e.g., containerd/CRI-O, runc/crun) have released generally available versions that support the concurrent use of `hostNetwork` and user namespaces.
173+
- Add e2e tests to ensure feature availability.
174+
- Document the limitations of combining user namespaces and `hostNetwork` (e.g., CAP_NET_RAW, CAP_NET_ADMIN, CAP_NET_BIND_SERVICE remain restricted).
175+
176+
#### GA
177+
178+
- The feature has been stable in Beta for at least 2 Kubernetes releases.
179+
- Multiple major container runtimes support the feature.
180+
181+
182+
### Upgrade / Downgrade Strategy
183+
184+
Upgrade: After upgrading to a version that supports this KEP, the `UserNamespacesHostNetworkSupport` feature gate can be enabled at any time.
185+
186+
Downgrade: If downgrading to a version that does not support this KEP, the kube-apiserver will revert to strict validation. Pods already running with this combination will be unaffected, but new or updated Pod requests attempting to use this combination will be rejected.
187+
188+
### Version Skew Strategy
189+
190+
A newer kube-apiserver with this feature enabled will accept such a Pod.
191+
192+
An older kubelet will still get the Pod definition from the kube-apiserver.
193+
It will attempt to create the Pod, and the success or failure will depend on the version of the container runtime it is using.
194+
195+
## Production Readiness Review Questionnaire
196+
197+
### Feature Enablement and Rollback
198+
199+
###### How can this feature be enabled / disabled in a live cluster?
200+
201+
- [ ] Feature gate (also fill in values in `kep.yaml`)
202+
- Feature gate name: `UserNamespacesHostNetworkSupport`
203+
- Components depending on the feature gate: `kube-apiserver`
204+
- [ ] Other
205+
- Describe the mechanism:
206+
- Will enabling / disabling the feature require downtime of the control
207+
plane?
208+
- Will enabling / disabling the feature require downtime or reprovisioning
209+
of a node?
210+
211+
###### Does enabling the feature change any default behavior?
212+
No. The behavior only changes when a user explicitly sets both `hostNetwork: true` and `hostUsers: false` in a Pod spec.
213+
The behavior of all existing Pods is unaffected.
214+
215+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
216+
217+
Yes. It can be disabled by setting the feature gate to false and restarting the kube-apiserver.
218+
This restores the old validation logic.
219+
It will not affect any Pods already running with this combination but will prevent new ones from being created.
220+
221+
###### What happens if we reenable the feature if it was previously rolled back?
222+
The kube-apiserver will once again begin to accept the combination of `hostNetwork: true` and `hostUsers: false`.
223+
This is a stateless change, and reenabling is safe.
224+
225+
###### Are there any tests for feature enablement/disablement?
226+
227+
### Rollout, Upgrade and Rollback Planning
228+
229+
###### How can a rollout or rollback fail? Can it impact already running workloads?
230+
231+
The [Version Skew Strategy](#version-skew-strategy) section covers this point.
232+
233+
###### What specific metrics should inform a rollback?
234+
235+
N/A
236+
237+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
238+
239+
This will be validated via manual testing.
240+
241+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
242+
243+
No.
244+
245+
### Monitoring Requirements
246+
247+
<!--
248+
This section must be completed when targeting beta to a release.
249+
250+
For GA, this section is required: approvers should be able to confirm the
251+
previous answers based on experience in the field.
252+
-->
253+
254+
###### How can an operator determine if the feature is in use by workloads?
255+
256+
<!--
257+
Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
258+
checking if there are objects with field X set) may be a last resort. Avoid
259+
logs or events for this purpose.
260+
-->
261+
262+
###### How can someone using this feature know that it is working for their instance?
263+
264+
<!--
265+
For instance, if this is a pod-related feature, it should be possible to determine if the feature is functioning properly
266+
for each individual pod.
267+
Pick one more of these and delete the rest.
268+
Please describe all items visible to end users below with sufficient detail so that they can verify correct enablement
269+
and operation of this feature.
270+
Recall that end users cannot usually observe component logs or access metrics.
271+
-->
272+
273+
- [ ] Events
274+
- Event Reason:
275+
- [ ] API .status
276+
- Condition name:
277+
- Other field:
278+
- [ ] Other (treat as last resort)
279+
- Details:
280+
281+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
282+
283+
<!--
284+
This is your opportunity to define what "normal" quality of service looks like
285+
for a feature.
286+
287+
It's impossible to provide comprehensive guidance, but at the very
288+
high level (needs more precise definitions) those may be things like:
289+
- per-day percentage of API calls finishing with 5XX errors <= 1%
290+
- 99% percentile over day of absolute value from (job creation time minus expected
291+
job creation time) for cron job <= 10%
292+
- 99.9% of /health requests per day finish with 200 code
293+
294+
These goals will help you determine what you need to measure (SLIs) in the next
295+
question.
296+
-->
297+
298+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
299+
300+
<!--
301+
Pick one more of these and delete the rest.
302+
-->
303+
304+
- [ ] Metrics
305+
- Metric name:
306+
- [Optional] Aggregation method:
307+
- Components exposing the metric:
308+
- [ ] Other (treat as last resort)
309+
- Details:
310+
311+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
312+
313+
<!--
314+
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
315+
implementation difficulties, etc.).
316+
-->
317+
318+
### Dependencies
319+
320+
###### Does this feature depend on any specific services running in the cluster?
321+
322+
No
323+
324+
### Scalability
325+
326+
###### Will enabling / using this feature result in any new API calls?
327+
No.
328+
329+
###### Will enabling / using this feature result in introducing new API types?
330+
No.
331+
332+
###### Will enabling / using this feature result in any new calls to the cloud provider?
333+
No.
334+
335+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
336+
No.
337+
338+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
339+
No.
340+
341+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
342+
No.
343+
344+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
345+
No.
346+
347+
### Troubleshooting
348+
349+
###### How does this feature react if the API server and/or etcd is unavailable?
350+
No impact to the running workloads
351+
352+
###### What are other known failure modes?
353+
If the container runtime or low-level runtime (e.g., containerd/runc) does not support the combination of hostNetwork and user namespaces, the pod will remain stuck in the `ContainerCreating` state and fail to be created.
354+
355+
###### What steps should be taken if SLOs are not being met to determine the problem?
356+
357+
N/A
358+
359+
## Implementation History
360+
361+
* 2025-10-03: Initial proposal
362+
363+
## Drawbacks
364+
365+
There are no known drawbacks at this time.
366+
367+
368+
## Alternatives
369+
370+
Add this feature to the existing `UserNamespacesSupport` feature gate:
371+
372+
* This was ruled out because the `UserNamespacesSupport` feature is approaching GA, and its functionality should be stable.
373+
Adding a new, externally-dependent, and immature behavior to a nearly-GA feature would introduce unnecessary risk and delays. Keeping the two feature gates separate is cleaner and safer.
374+
375+
Do not implement this feature:
376+
* Users can use `hostPort` as an alternative to `hostNetwork`, but this may cause some disruption to the existing user environment, as certain privileged containers require direct interaction with the host network stack. Moreover, `hostPort` requires pre-configured CNI; otherwise, the pod will fail to start. This limitation is precisely why Kubernetes control plane components continue to rely on `hostNetwork`.
377+
378+
## Infrastructure Needed (Optional)
379+
380+
No new infrastructure needed.

0 commit comments

Comments
 (0)