Skip to content

[WIP] RangeStream#21358

Open
Jefftree wants to merge 8 commits intoetcd-io:mainfrom
Jefftree:range-stream
Open

[WIP] RangeStream#21358
Jefftree wants to merge 8 commits intoetcd-io:mainfrom
Jefftree:range-stream

Conversation

@Jefftree
Copy link

@Jefftree Jefftree commented Feb 24, 2026

RangeStream for etcd

Benchmark Results

Setup

  • etcd single node on localhost
  • All modes accumulate full results in client memory for fair comparison
  • Client peak memory: max HeapInuse sampled at 1ms via runtime.ReadMemStats
  • Server peak memory: max go_memstats_heap_inuse_bytes from /metrics sampled at 50ms

Small values (100-byte values, 1 client, 10 iterations)

All comparisons relative to Stream (count once).

100k keys (~10 MB total data)

Mode Latency Client Mem Server Mem Throughput
Stream (count once) 0.10s 38 MB 52 MB 117 MB/s
Single-shot unary 0.10s 54 MB (1.4x worse) 110 MB (2.1x worse) 114 MB/s
Paginated unary (10k) 0.15s (1.5x worse) 38 MB 52 MB 90 MB/s (1.3x worse)
Stream (count always) 0.13s (1.3x worse) 39 MB 51 MB 88 MB/s (1.3x worse)

500k keys (~50 MB total data)

Mode Latency Client Mem Server Mem Throughput
Stream (count once) 0.64s 141 MB 190 MB 98 MB/s
Single-shot unary 0.60s 253 MB (1.8x worse) 498 MB (2.6x worse) 98 MB/s
Paginated unary (10k) 1.68s (2.6x worse) 140 MB 179 MB 33 MB/s (2.9x worse)
Stream (count always) 1.95s (3.0x worse) 141 MB 175 MB 34 MB/s (2.9x worse)

1M keys (~100 MB total data)

Mode Latency Client Mem Server Mem Throughput
Stream (count once) 1.37s 271 MB 348 MB 90 MB/s
Single-shot unary 1.39s 494 MB (1.8x worse) 1000 MB (2.9x worse) 89 MB/s
Paginated unary (10k) 5.44s (4.0x worse) 269 MB 339 MB 22 MB/s (4.2x worse)
Stream (count always) 6.78s (4.9x worse) 270 MB 329 MB 19 MB/s (4.8x worse)

2M keys (~200 MB total data)

Mode Latency Throughput
Stream (count once) 2.76s 86 MB/s
Single-shot unary 2.79s 85 MB/s
Paginated unary (10k) 19.21s (7.0x worse) 12 MB/s (6.9x worse)
Stream (count always) 23.27s (8.4x worse) 10 MB/s (8.4x worse)

Large values (10k keys x 100KB values, ~1 GB total data)

1 client, 1 iteration

Mode Latency Client Mem Server Mem Throughput
Stream (count once) 0.48s 1.01 GB 1.03 GB 1.99 GB/s
Single-shot unary 1.38s (2.9x worse) 2.92 GB (2.9x worse) 2.47 GB (2.4x worse) 710 MB/s (2.8x worse)

5 clients, 50 iterations

Mode P50 Latency P90 Latency Client Mem Server Mem Throughput
Stream 1.81s 8.33s 5.01 GB 1.06 GB 383 MB/s
Single-shot 28.30s (15.6x worse) 42.12s (5.1x worse) 13.91 GB (2.8x worse) 16.78 GB (15.8x worse) 37 MB/s (10.5x worse)

With concurrent clients and large values, single-shot collapses: 5 in-flight ~1 GB responses overwhelm both client and server memory, causing GC thrashing. Stream stays comfortable at ~1 GB server memory since chunks are sent and freed incrementally.

Key observations

  1. Stream matches single-shot on latency and throughput for small values while using 1.8x less client memory and 2.9x less server memory at 1M keys.
  2. With large values, stream wins on every metric — 2.9x faster, 2.9x less client memory, 2.4x less server memory.
  3. Under concurrency with large values, stream's advantage explodes — 15.6x faster latency, 15.8x less server memory. Single-shot's memory footprint causes GC thrashing and swapping.
  4. Paginated unary has no advantages — slower than stream at every size (1.5x at 100k, 7x at 2M), with the gap growing superlinearly due to per-RPC overhead compounding across hundreds of pages.
  5. Count-always is expensive — computing total count on every chunk via O(n) b-tree scan makes stream count-always slower than paginated at 1M+ keys. Counting once at the start is critical.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Jefftree
Once this PR has been reviewed and has the lgtm label, please assign siyuanfoundation for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link

Hi @Jefftree. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@serathius
Copy link
Member

/ok-to-test

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Signed-off-by: Jefftree <jeffrey.ying86@live.com>
@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 63.37449% with 89 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.02%. Comparing base (353dd4e) to head (bb05e9b).
⚠️ Report is 90 commits behind head on main.

Files with missing lines Patch % Lines
server/etcdserver/v3_server.go 48.97% 46 Missing and 4 partials ⚠️
client/v3/namespace/kv.go 0.00% 23 Missing ⚠️
server/etcdserver/api/v3rpc/key.go 62.50% 3 Missing and 3 partials ⚠️
client/v3/kv.go 95.91% 1 Missing and 1 partial ⚠️
client/v3/leasing/kv.go 0.00% 2 Missing ⚠️
client/v3/mock/mockserver/mockserver.go 0.00% 2 Missing ⚠️
server/proxy/grpcproxy/adapter/chan_stream.go 87.50% 1 Missing and 1 partial ⚠️
server/storage/mvcc/index.go 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
client/v3/retry.go 79.83% <100.00%> (+0.34%) ⬆️
server/etcdserver/apply/backend.go 74.86% <100.00%> (ø)
server/etcdserver/txn/range.go 98.31% <100.00%> (+6.79%) ⬆️
server/etcdserver/txn/txn.go 95.27% <100.00%> (ø)
...erver/proxy/grpcproxy/adapter/kv_client_adapter.go 100.00% <100.00%> (ø)
server/proxy/grpcproxy/kv.go 97.18% <100.00%> (+0.12%) ⬆️
server/storage/mvcc/kv.go 40.00% <ø> (ø)
server/storage/mvcc/kvstore_txn.go 73.40% <100.00%> (ø)
client/v3/kv.go 95.04% <95.91%> (+0.59%) ⬆️
client/v3/leasing/kv.go 89.14% <0.00%> (-1.26%) ⬇️
... and 6 more

... and 23 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #21358      +/-   ##
==========================================
- Coverage   68.36%   68.02%   -0.35%     
==========================================
  Files         428      426       -2     
  Lines       35277    35447     +170     
==========================================
- Hits        24118    24112       -6     
- Misses       9760     9939     +179     
+ Partials     1399     1396       -3     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 353dd4e...bb05e9b. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Jefftree Jefftree force-pushed the range-stream branch 2 times, most recently from e13f765 to 74212ab Compare February 26, 2026 01:52
@Jefftree
Copy link
Author

/retest

Jefftree added 7 commits March 2, 2026 16:03
Fix interface stubs, mock servers, test logging, tracing, lint
violations, and proto annotations that were missed in the initial
RangeStream commit.

Signed-off-by: Jefftree <jeffrey.ying86@live.com>
Add CountTotal flag to RangeOptions so treeIndex.Revisions() can skip
counting all matching keys when only a limited page is needed. The first
RangeStream chunk uses countTotal=true for the accurate total; subsequent
chunks use countTotal=false, reducing per-page cost from O(total_keys)
to O(limit).

Signed-off-by: Jefftree <jeffrey.ying86@live.com>
Fix EOF handling, data race between channel close and concurrent
SendMsg, and goroutine leak in the pipeStream adapter used by the
gRPC proxy's RangeStream forwarding.

Signed-off-by: Jefftree <jeffrey.ying86@live.com>
… cases

Rewrite TestKVRange to use a table-driven approach with distinct values
and ~25 test cases covering sorts, limits, count-only, keys-only,
prefix, from-key, historical revisions, and min/max revision filters.
Both Unary and Stream paths are exercised via StreamToUnary.

Signed-off-by: Jefftree <jeffrey.ying86@live.com>
- Fix sort condition in rangeStream that was always true, causing all
  requests to hit the unary fallback path instead of streaming.
- Deduplicate header merge logic in client/v3/kv.go into a shared
  mergeRangeStreamChunk helper.
- Fix typo in comment ("send" -> "sent").

Signed-off-by: Jefftree <jeffrey.ying86@live.com>
Signed-off-by: Jefftree <jeffrey.ying86@live.com>
Implement GetStream on the namespace kvPrefix wrapper following the
same pattern as Get: prefix the key and range_end, delegate to the
inner KV, and strip the prefix from response keys in each streamed
chunk.

For the leasing KV wrapper, fall back to the inner KV since per-key
cache does not apply to streaming ranges.

Signed-off-by: Jefftree <jeffrey.ying86@live.com>
@Jefftree
Copy link
Author

Jefftree commented Mar 4, 2026

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants