perf research: streams by crowlbot · Pull Request #4 · crowlbot/deno

crowlbot · 2026-05-18T00:59:18Z

TL;DR

No high-impact architectural slowdown found in the streams machinery. Across the macro and microbench, Deno's ReadableStream/WritableStream/TransformStream either beat Node 22 LTS / Node 23 / Bun, or are within ~1.5× on small-construction microbenches.

This PR exists to record the negative result so a future session does not re-investigate.

Headline ratios

16 MB body through `pipeThrough(TransformStream uppercase) → pipeTo(sink)`, 20 iters

Runtime	MB/s	ratio vs Deno
Deno 2.7.14	440.7	1.00×
Bun 1.1.43	296.8	Deno 1.48× faster
Node 23.7.0	243.2	Deno 1.81× faster
Node 22.13.1	221.3	Deno 1.99× faster

Microbench excerpts (ns/op, lower is better)

bench	Deno	Node 22	Bun
`ts_construct`	7,277	23,629	9,492
`rs_read_256x4k`	173,641	328,503	155,234
`rs_pipethrough_identity_256x4k`	1,096,835	1,967,810	1,480,243
`rs_pipeto_sink_256x4k`	743,680	1,169,643	769,557

Full results are in tools/perf_research/streams/profiles/streams_*.json. See tools/perf_research/streams/README.md for the full report (hypotheses considered and ruled out, V8 prof attribution).

Where the time goes

V8 prof on the macro bench (tools/perf_research/streams/profiles/streams_macro.prof.txt):

~50 % of total ticks land in the user transform function (per-byte uppercase loop + new Uint8Array(N)).
~24 % in shared libraries (libc malloc / deno binary — same allocation tail).
~5 % of nonlib ticks in the streams machinery itself (writeAlgorithm, chunkSteps, transformAlgorithm).

The user-allocation tail (new Uint8Array(N) hitting libc malloc) is the same cross-cutting cost already documented in PR #1 (fetch, H3) and PR #3 (text-encoding, H1). Root cause is v8_typed_array_max_size_in_heap = 0 in rusty-v8 — out of streams scope.

Hypotheses considered and ruled out

#	Hypothesis	Verdict
H1	TransformStream per-chunk promise chain too deep	Rejected — 2× faster than Node on macro
H2	pipeThrough copies chunks at the boundary	Rejected — identity pipeThrough is 1.79× faster than Node
H3	BYOB read path is slow	Unranked — bench denoland#8 hangs on Deno and Bun (correctness, not perf)
H4	tee'd reads scale badly	Rejected (informally) — macro pipeline wins

What's not here

Native flamegraph attribution. kernel.perf_event_paranoid = 4 on the bench host and sudo is unavailable, so perf / samply are blocked. JS attribution via V8 --prof is what's in profiles/streams_macro.prof.txt. This same constraint applied to PRs perf research: fetch #1–perf research: text-encoding #3 on this fork.
Upstream PR. No graduated upstream fix from this surface — there is nothing landable.

Layout

tools/perf_research/streams/
  README.md                                       full report
  micro/streams_micro.js                          10 ops
  micro/streams_macro.js                          16 MB pipeThrough macro
  profiles/streams_{deno,node22,node23,bun}.json  raw bench output per runtime
  profiles/streams_macro.prof.txt                 V8 --prof process output
  profiles/versions.txt                           runtime versions + host caps

Microbench covers ReadableStream construct/read, TransformStream identity + copy pipeThrough, WritableStream pipeTo sink, BYOB read, async iter, and tee. Benches were not run before the session was transferred — the next worker should run the micro across deno/node/bun, capture a V8 prof, and write up the report per the pattern used in perf-research/fetch (PR 1), perf-research/url (PR 2), and perf-research/text-encoding (PR 3) on the crowlbot/deno fork.

16 MB pipeThrough macro: Deno 440.7 MB/s vs Bun 296.8 / Node 22.13 221.3 / Node 23.7 243.2. Microbench: Deno is faster than Node on every bench that completed, within 1.5x of Bun on construction microbenches and ahead of both Node and Bun on pipeThrough/pipeTo. V8 prof on the macro bench attributes ~50 percent of ticks to user transform code (per-byte loop + new Uint8Array(N)), ~24 percent to shared libraries (libc malloc / deno binary, mostly the same allocation tail), and only ~5 percent of nonlib ticks to streams machinery itself. No high-impact streams architectural slowdown to attack.

crowlbot added 2 commits May 14, 2026 14:52

crowlbot mentioned this pull request May 18, 2026

perf research: structuredClone #5

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf research: streams#4

perf research: streams#4
crowlbot wants to merge 2 commits into
mainfrom
perf-research/streams

crowlbot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

crowlbot commented May 18, 2026

TL;DR

Headline ratios

16 MB body through pipeThrough(TransformStream uppercase) → pipeTo(sink), 20 iters

Microbench excerpts (ns/op, lower is better)

Where the time goes

Hypotheses considered and ruled out

What's not here

Layout

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

16 MB body through `pipeThrough(TransformStream uppercase) → pipeTo(sink)`, 20 iters