Skip to content

Commit 032c3d0

Browse files
committed
Propose KEP-5116: Streaming response encoding
1 parent 930d228 commit 032c3d0

File tree

3 files changed

+380
-0
lines changed

3 files changed

+380
-0
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 5116
2+
beta:
3+
approver: "@jpbetz"
Lines changed: 352 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,352 @@
1+
# KEP-5116: Streaming Encoding for LIST Responses
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [Risks and Mitigations](#risks-and-mitigations)
11+
- [Design Details](#design-details)
12+
- [Test Plan](#test-plan)
13+
- [Prerequisite testing updates](#prerequisite-testing-updates)
14+
- [Unit tests](#unit-tests)
15+
- [Integration tests](#integration-tests)
16+
- [e2e tests](#e2e-tests)
17+
- [Graduation Criteria](#graduation-criteria)
18+
- [Beta](#beta)
19+
- [GA](#ga)
20+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
21+
- [Version Skew Strategy](#version-skew-strategy)
22+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
23+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
24+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
25+
- [Monitoring Requirements](#monitoring-requirements)
26+
- [Dependencies](#dependencies)
27+
- [Scalability](#scalability)
28+
- [Troubleshooting](#troubleshooting)
29+
- [Implementation History](#implementation-history)
30+
- [Drawbacks](#drawbacks)
31+
- [Alternatives](#alternatives)
32+
<!-- /toc -->
33+
34+
## Release Signoff Checklist
35+
36+
Items marked with (R) are required *prior to targeting to a milestone / release*.
37+
38+
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
39+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
40+
- [x] (R) Design details are appropriately documented
41+
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
42+
- [x] e2e Tests for all Beta API Operations (endpoints)
43+
- [x] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
44+
- [x] (R) Minimum Two Week Window for GA e2e tests to prove flake free
45+
- [x] (R) Graduation criteria is in place
46+
- [x] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
47+
- [x] (R) Production readiness review completed
48+
- [x] (R) Production readiness review approved
49+
- [ ] "Implementation History" section is up-to-date for milestone
50+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
51+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
52+
53+
54+
## Summary
55+
56+
This KEP proposes implementing streaming encoding for LIST responses served by the Kubernetes API server.
57+
Existing encoders marshall response into one block allocating GBs of data and keeping it until client reads the whole response.
58+
For large LIST responses, this leads to excessive memory consumption in the API server.
59+
Streaming the encoding process can significantly reduce memory usage, improving scalability and cost-efficiency.
60+
61+
## Motivation
62+
63+
The Kubernetes API server's memory usage presents a significant challenge, particularly when dealing with large resources and LIST requests.
64+
Users can easily issue LIST requests that retrieve gigabytes of data, especially with Custom Resource Definitions (CRDs), which often suffer from significant data bloat when encoded in JSON.
65+
66+
Current API server response encoders were designed with smaller responses in mind,
67+
assuming they could allocate the entire response in a single contiguous memory block.
68+
This assumption breaks down with the scale of data returned by large LIST requests.
69+
Even well-intentioned users can create naive controllers that issue multiple concurrent LIST requests without properly handling the responses.
70+
This can lead to the API server holding entire responses in memory for extended periods, sometimes minutes, while waiting for the controller to process them.
71+
72+
The resulting unpredictable memory usage forces administrators to significantly over-provision API server memory to accommodate potential spikes.
73+
74+
### Goals
75+
76+
* Implement streaming encoders for JSON and Protocol Buffer for LIST responses.
77+
* Significantly reduce and make more predictable the API server's memory consumption when serving large LIST responses.
78+
79+
### Non-Goals
80+
81+
* Implementing streaming decoders in clients. This KEP focuses on protecting the API server's memory usage. Clients can utilize existing mechanisms like pagination or WatchList to manage large datasets.
82+
* Implementing streaming encoders for all content types (e.g., "as=Table"). This KEP focuses on the most commonly used and resource-intensive content types to address the most impactful cases first.
83+
* Implementing streaming for CBOR encoding at this time. CBOR support will be considered as part of a broader effort related to CBOR serialization in Kubernetes and tracked separately.
84+
85+
## Proposal
86+
87+
This proposal focuses on implementing streaming encoding for JSON and Protocol Buffer (Proto) for LIST responses.
88+
The core idea is to avoid loading the entire LIST response into memory before encoding.
89+
Instead, the encoder will process objects individually, streaming the encoded data to the client.
90+
Assuming we will deliver all nessesery testing we plan to launch the feature directly to Beta.
91+
92+
Encoding items one by one significantly reduces the memory footprint required by the API server.
93+
Given the Kubernetes limit of 1MB per object, the memory overhead per request becomes manageable.
94+
While this approach may increase overall CPU usage and memory allocations,
95+
the trade-off is considered worthwhile due to the substantial reduction in peak memory usage,
96+
leading to improved API server stability and scalability.
97+
98+
Existing JSON and Proto encoding libraries do not natively support streaming.
99+
Therefore, custom streaming encoders will be implemented.
100+
Because we focus on encoding LIST responses, the implementation scope is narrowed,
101+
requiring encoders for a limited set of Kubernetes API types.
102+
We anticipate approximately 100 lines of code per encoder per type.
103+
Extensive testing, drawing upon test cases developed for the CBOR serialization effort,
104+
will ensure compatibility with existing encoding behavior.
105+
106+
Long term, the goal is for upstream JSON and Proto libraries to natively support streaming encoding.
107+
For JSON, initial exploration and validation using the experimental `json/v2` package has shown
108+
promising results and confirmed its suitability for our requirements.
109+
Further details can be found in [kubernetes/kubernetes#129304](https://github.com/kubernetes/kubernetes/issues/129304#issuecomment-2612704644).
110+
111+
### Risks and Mitigations
112+
113+
114+
## Design Details
115+
116+
Implementing streaming encoders specifically for LIST responses significantly reduces the scope,
117+
allowing us to focus on a limited set of types and avoid the complexities of a fully generic streaming encoder.
118+
The core difference in our approach will be special handling of the `Items` field within LIST structures.
119+
Instead of encoding the entire `Items` array at once, we will iterate through the array and encode each item individually, streaming the encoded data to the client.
120+
121+
This targeted approach enables the following implementation criteria:
122+
123+
* **Strict Validation:** Before proceeding with streaming encoding,
124+
the implementation will rigorously validate the Go struct tags of the target type.
125+
If the tags do not precisely match the expected structure, we will fallback to standard encoder.
126+
This precautionary measure prevents incompatibility upon change of structure fields or encoded representation.
127+
* **Delegation to Standard Encoder:** The encoding of all fields *other than* `Items`,
128+
as well as the encoding of each individual item *within* the `Items` array,
129+
will be delegated to the standard `encoding/json` (for JSON) or `protobuf` (for Proto) packages.
130+
This leverages the existing, well-tested encoding logic and minimizes the amount of custom code required, reducing the risk of introducing bugs.
131+
132+
The types requiring custom streaming encoders are:
133+
134+
* `*List` types for built-in Kubernetes API resources (e.g., `PodList`, `ConfigMapList`).
135+
* `UnstructuredList` for Custom Resources.
136+
* `runtime.Unknown` used for Proto encoder to provide type information.
137+
138+
To further enhance robustness, a static analysis check will be introduced to detect and prevent any inconsistencies in Go struct tags across different `*List` types.
139+
This addresses the concern that not all `*List` types may have perfectly consistent tag definitions.
140+
141+
### Gzip encoding
142+
143+
As pointed out in kubernetes/kubernetes#129334#discussion_r1938405782,
144+
the current Kubernetes gzip encoding implementation assumes the response is written in a single large chunk,
145+
checking just first write size to determine if the response is large enough for compression.
146+
This is a bad assumption about internal encoder implementation details and should be fixed regardless.
147+
148+
To ensure gzip compression works well with streaming,
149+
we will preempt all encoder changes by fixing the gzip compression.
150+
First will add unit tests that will prevent subsequent changes from impacting results,
151+
especially around the compression threshold.
152+
Then, we will rewrite the gzip compression to buffer the response and delay the
153+
decision to enable compression until we have observed enough bytes to hit the threshold
154+
or we received whole response and we can write it without compressing.
155+
156+
### Test Plan
157+
158+
[x] I/we understand the owners of the involved components may require updates to
159+
existing tests to make this code solid enough prior to committing the changes necessary
160+
to implement this enhancement.
161+
162+
##### Prerequisite testing updates
163+
164+
##### Unit tests
165+
166+
We will implement testing following the cases borrowed from the CBOR test plan ([https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4222-cbor-serializer#test-plan](https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4222-cbor-serializer#test-plan)), skipping tests that do not apply to streaming *encoding*, such as those related to decoding.
167+
168+
Specifically, we will ensure byte-for-byte compatibility with the standard `encoding/json` and `protobuf` encoders for the following cases:
169+
170+
* Preserving the distinction between integers and floating-point numbers.
171+
* Handling structs with duplicate field names (json tag names) without producing duplicate keys in the encoded output ([https://go.dev/issue/17913](https://go.dev/issue/17913)).
172+
* Encoding Go strings containing invalid UTF-8 sequences without error.
173+
* Preserving the distinction between absent, present-but-null, and present-and-empty states for slices and maps.
174+
* Ensuring raw bytes.
175+
176+
Fuzz tests will cover the custom streaming encoders for the types with overwritten encoders:
177+
* testingapigroup.CarpList as surigate for builtin types
178+
* UnstructuredList
179+
180+
The skipped tests are primarily related to decoding or CBOR-specific features, which are not relevant to the streaming encoding of JSON and Proto addressed by this KEP.
181+
182+
##### Integration tests
183+
184+
With one to one compatibility to standard encoder we don't expect integration tests between components will be needed.
185+
186+
##### e2e tests
187+
188+
Scalability tests that will confirm the improvements and protect against future regressions.
189+
Improvements in the resources should be noticiable in on the perf-dash.
190+
191+
The tests will cover the following properties:
192+
* Large resource, 10'000 objects each 100KB size.
193+
* List with `RV=0` to ensure response is served from watch cache and all the overhead comes from encoder memory allocation.
194+
* Different content type JSON (default), Proto, CBOR.
195+
* Different resource types Configmap, Pod, Custom Resource.
196+
197+
In first iteration we expect we will overallocate the resources needed for apiserver to ensure passage,
198+
however after the improvement is implemented we will tune down the resources to detect regressions.
199+
200+
### Graduation Criteria
201+
202+
#### Beta
203+
204+
- Gzip compression is supporting chunking
205+
- All encoder unit tests are implemented
206+
- Streaming encoder for JSON and Proto are implemented
207+
- Scalability test are running and show improvement
208+
209+
#### GA
210+
211+
- Scalability tests are release blocking
212+
213+
214+
### Upgrade / Downgrade Strategy
215+
216+
We plan to provide byte to byte compatibility.
217+
218+
### Version Skew Strategy
219+
220+
We plan to provide byte to byte compatibility.
221+
222+
## Production Readiness Review Questionnaire
223+
224+
### Feature Enablement and Rollback
225+
226+
Via feature gates
227+
228+
###### How can this feature be enabled / disabled in a live cluster?
229+
230+
231+
- [X] Feature gate (also fill in values in `kep.yaml`)
232+
- Feature gate name: StreamingCollectionEncodingToJSON, StreamingCollectionEncodingToProto
233+
- Components depending on the feature gate: apiserver
234+
235+
###### Does enabling the feature change any default behavior?
236+
237+
No, we provide byte to byte compatibility.
238+
239+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
240+
241+
Yes, without problem.
242+
243+
###### What happens if we reenable the feature if it was previously rolled back?
244+
245+
###### Are there any tests for feature enablement/disablement?
246+
247+
Yes, will be covered by unit tests.
248+
249+
### Rollout, Upgrade and Rollback Planning
250+
251+
N/A
252+
253+
###### How can a rollout or rollback fail? Can it impact already running workloads?
254+
255+
N/A
256+
257+
###### What specific metrics should inform a rollback?
258+
259+
N/A
260+
261+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
262+
263+
N/A
264+
265+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
266+
267+
No
268+
269+
### Monitoring Requirements
270+
271+
272+
###### How can an operator determine if the feature is in use by workloads?
273+
274+
N/A
275+
276+
###### How can someone using this feature know that it is working for their instance?
277+
278+
N/A
279+
280+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
281+
282+
N/A
283+
284+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
285+
286+
N/A
287+
288+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
289+
290+
N/A
291+
292+
### Dependencies
293+
294+
No
295+
296+
###### Does this feature depend on any specific services running in the cluster?
297+
298+
No
299+
300+
### Scalability
301+
302+
###### Will enabling / using this feature result in any new API calls?
303+
304+
No
305+
306+
###### Will enabling / using this feature result in introducing new API types?
307+
308+
No
309+
310+
###### Will enabling / using this feature result in any new calls to the cloud provider?
311+
312+
No
313+
314+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
315+
316+
No
317+
318+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
319+
320+
No, we expect reduction.
321+
322+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
323+
324+
No, we expect reduction.
325+
326+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
327+
328+
No
329+
330+
### Troubleshooting
331+
332+
###### How does this feature react if the API server and/or etcd is unavailable?
333+
334+
N/A
335+
336+
###### What are other known failure modes?
337+
338+
N/A
339+
340+
###### What steps should be taken if SLOs are not being met to determine the problem?
341+
342+
## Implementation History
343+
344+
## Drawbacks
345+
346+
Maintaining around 500 lines of custom encoder code.
347+
348+
## Alternatives
349+
350+
Wait for `json/v2` promotion from experimental, this reduces the maintenance, however it comes with even more risk.
351+
New package comes with breaking changes, testing showed that even when enabled in `v1` compatibility there might be some problems.
352+
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
title: Streaming JSON Encoding for LIST Responses
2+
kep-number: 5116
3+
authors:
4+
- serathius
5+
owning-sig: sig-api-machinery
6+
participating-sigs:
7+
- sig-aaa
8+
- sig-bbb
9+
status: implementable
10+
creation-date: 2025-01-31
11+
reviewers:
12+
- liggit
13+
approvers:
14+
- deads2k
15+
stage: beta
16+
latest-milestone: "v1.33"
17+
milestone:
18+
beta: "v1.33"
19+
feature-gates:
20+
- name: StreamingCollectionEncodingToJSON
21+
components:
22+
- kube-apiserver
23+
- name: StreamingCollectionEncodingToProto
24+
components:
25+
- kube-apiserver

0 commit comments

Comments
 (0)