Skip to content

perf research: structuredClone#5

Draft
crowlbot wants to merge 1 commit into
mainfrom
perf-research/structuredClone
Draft

perf research: structuredClone#5
crowlbot wants to merge 1 commit into
mainfrom
perf-research/structuredClone

Conversation

@crowlbot
Copy link
Copy Markdown
Owner

TL;DR

structuredClone(<primitive>) is ~3× slower than Node 22 LTS / Node 23 and ~10× slower than Bun. The cause is V8 failing to inline the structuredClone JS function body (in ext/web/13_message_port.js) because of its size, so every primitive call pays the function-call-boundary cost.

Wrapping the function with a tiny external fast-path identical to the one inside structuredClone collapses the cost from ~2,890 ns → ~25 ns (~115× faster) in Deno. Node and Bun see the same effect (33× and 10× respectively) — Deno just has the largest gap because Node and Bun's internal serializers handle primitives more cheaply behind the function-call boundary.

Graduating to an upstream perf fix (separate PR): split into structuredCloneSlow + a small inlinable wrapper. The benchmark and fix live there.

Headline ratios

Primitives — the finding

Bench Deno Node 22 Node 23 Bun Deno vs Node 22 Deno vs Bun
clone_number 3,289 1,079 1,052 281 3.05× slower 11.7× slower
clone_short_string 3,551 1,171 1,213 574 3.03× slower 6.19× slower
clone_boolean 3,009 1,050 1,095 281 2.87× slower 10.7× slower

Wrapper test — proves the inlining hypothesis

`micro/sc_wrapper_test.js` wraps `structuredClone` with an external fast-path identical to the one inside the function:

Runtime `realSC(42)` `wrappedSC(42)` Speedup
Deno 2.7.14 2,857 ns 25.5 ns 112×
Node 22.13.1 1,038 ns 30.7 ns 33×
Bun 1.1.43 294 ns 28.0 ns 10×

The user-space wrapper has the SAME fast-path logic — it just lives in a small function V8 can inline.

Full microbench

15 patterns covering primitives, objects, arrays, maps, TypedArrays, ArrayBuffers, DataViews, mixed-shapes. Raw: `profiles/sc_micro_all.log`. Notable extras:

  • `clone_u8_64k`: Deno 65,898 ns vs Node 22 31,715 / Bun 27,239 — 2× slower than both. Path: `02_structured_clone.js:79-134` (ArrayBufferPrototypeSlice + Symbol.toStringTag switch + dead-code WeakMap allocation). Not graduating — needs native flamegraph to attribute precisely, and could be folded into a separate cleanup once the dead WeakMap is removed.
  • `clone_array_1000_strs`: Deno is 1.5× faster than Bun here. Strong showing.
  • `clone_u8_1m`: Deno is 1.17× faster than Node 22.

Where the cost lives

  • `ext/web/13_message_port.js:614-690` — `structuredClone(value, options)`. The body is ~75 lines (webidl converter setup, kNotSerializable check, transfer-list dispatch). V8 won't inline this. Every call pays the boundary cost even when the existing internal fast path (`if (arguments.length >= 1 && options === undefined) { ... return value; }`) is taken.
  • `ext/web/02_structured_clone.js:46` — `const objectCloneMemo = new SafeWeakMap()`. Written via `WeakMapPrototypeSet` on every `ArrayBuffer` clone but never read anywhere in the codebase (grepped). Dead allocation per clone.

V8 prof — primitive hot path

Profile: `profiles/sc_prim.prof.txt`. 5M calls of `structuredClone(i)`, 3,042 ns/op.

Top builtin hits:

  • 583 ticks `CreateShallowObjectLiteral`
  • 333 ticks `LoadIC`
  • 267 ticks `webidl 00_webidl.js:755` (dictionary converter inner)
  • 113 ticks `ObjectAssign`
  • 65 ticks `ArrayIteratorPrototypeNext`

V8 reaches the dictionary converter and `ObjectAssign` in the bottom-up profile — meaning the fast-path return inside `structuredClone` isn't preventing V8 from emitting code for the rest of the body. Splitting the function lets V8 prove the fast path is the only reachable path for a primitive argument.

Ranked hypotheses

Rank Hypothesis Impact × Confidence Notes
H1 V8 cannot inline `structuredClone` because the function body is too large; splitting into `structuredCloneSlow` + a tiny inlinable wrapper recovers the fast path. HIGH × HIGH Proven by `sc_wrapper_test.js`: 2857 ns → 25 ns (112× faster). Graduating to upstream PR `perf/structuredClone-primitive-fastpath`.
H2 `objectCloneMemo` in `02_structured_clone.js` is dead code (filled, never read). low × high Confirmed by grep. Each ArrayBuffer clone allocates an entry that's never consulted. Not graduating on its own (task forbids drive-by changes).
H3 `clone_u8_64k` is 2× slower than Node/Bun. medium × medium Path goes through the giant Symbol.toStringTag switch + WeakMap allocation. Not attributed to architectural cost yet — native flamegraph required. Leaving unranked-for-action.

What's not here

Layout

```
tools/perf_research/structuredClone/
README.md full report
micro/sc_micro.js 15 ops
micro/sc_wrapper_test.js the wrapper-vs-real comparison
profiles/sc_micro_all.log raw bench output per runtime
profiles/sc_prim.prof.txt V8 --prof for the primitive hot path
profiles/versions.txt runtime versions + host caps
```

Macro finding: structuredClone(<primitive>) is 3x slower than Node and
~10x slower than Bun. Cause is V8 failing to inline the structuredClone
function body in ext/web/13_message_port.js (too large), so every call
pays the function-boundary cost. Verified by wrapping with an external
fast-path identical to the one inside the function: 2857 ns -> 25 ns
in Deno (112x faster), 1038 ns -> 31 ns in Node, 294 ns -> 28 ns in
Bun. Graduating to an upstream perf fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant