perf research: text-encoding#3
Draft
crowlbot wants to merge 1 commit into
Draft
Conversation
…8 prof - micro/text_encoding_micro.js: 14 ops covering encode (tiny/small/medium ASCII + UTF-8 mixed), encodeInto, decode (tiny/small/medium/large ASCII + UTF-8 mixed), stream-mode decode, encoder/decoder construct. - micro_results.jsonl: Deno 2.7.14 / Node v22.22.2 / Bun 1.3.14 numbers. - profiles/text_encoding_micro.prof.txt + .v8.log.gz: V8 --prof attribution showing libc malloc/realloc dominate the encode path (24.5 % of total ticks in libc, ~32 % nonlib in malloc/realloc) — confirms architectural cost of fresh backing stores per encode call.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
perf research: text-encoding
Macro performance research on Deno's implementation of
TextEncoderandTextDecoder(includingencodeIntoand stream-mode decoding).This PR contains only benchmark scripts and committed V8 prof artifacts — no
production code changes. The report below is the deliverable.
Methodology
ns/op is unreliable, so the headline is same-host ratios vs Node 22 LTS
and Bun.
--profin-process.perf/samplyneed
kernel.perf_event_paranoid<=1but the container is locked at3andsysctlis denied. JS-side attribution is comprehensive;native-side time is bucketed under "deno binary" and "libc.so.6".
UTF-8 mixed), encodeInto, decode at four ASCII sizes (tiny/small/medium/
large 1 MB) + UTF-8 mixed, stream-mode decode (5 chunks), and
encoder/decoder construction.
Pinned:
deno 2.7.14 (v8 14.7.173.20-rusty),node v22.22.2,bun 1.3.14.Headline ratios — microbench (ns/op; lower is better)
encode("hello world!")(12 B)encode("Content-Type: application/json")(30 B)encode("x".repeat(1000))(1 KB)encode(utf8mixed)(~50 B mixed)encodeInto(small, dest)decode(12 B ASCII)decode(30 B ASCII)decode(1 KB ASCII)decode(1 MB ASCII)decode(utf8mixed)(~50 B)decode(utf8medium mixed)(~1 KB)decode (5 stream:true chunks of 30 B)new TextEncoder()new TextDecoder()Reads:
decode(1 MB ASCII)— 93 µs vs Node's610 µs (6.5× faster), thanks to the
simdutf::validate_asciishortcircuit at
ext/web/lib.rs:528-535and thematching path in core's
op_decode.This is one of the rare wins documented here; flag for the team.
TextEncoder.encode("…")on small inputs is 20-34× slower than Bun.Bun has an intrinsic Response/encode path; Deno op-dispatches into
op_encodewhich always allocates a fresh shared backing store for theoutput Uint8Array.
new TextDecoder()is 28× slower than Bun (187 ns vs 7 ns). Everyconstruction op-dispatches through
op_encoding_new_decoderto obtain acppgcresource, even for the default"utf-8"case.decode(chunk, { stream: true })is 2× slower than Node,3.9× slower than Bun. Each chunk is an op call that returns a fresh
U16String(UTF-16 vector) marshaled across the boundary.decode_tiny_ascii118 ns vs 77 ns. The simdutf fast path lives on the ASCII branch, but
there's still op-dispatch overhead per call.
Flamegraph attribution (V8
--prof)Full profile:
tools/perf_research/text-encoding/profiles/text_encoding_micro.prof.txt(raw log:
text_encoding_micro.v8.log.gz).Top of
Statistical profiling result(1 740 ticks):~32 % of nonlib time is in
libc_malloc+libc_realloc. That is thebacking-store allocator path firing on every
new Uint8Array(N)createdinside
op_encode. JS-side cost is small (<10 % combined); essentiallyall time is in the native encode path and its allocator.
The
__lll_lock_wake_private/__lll_lock_wait_private(12.4 % combined)indicates contention on the libc malloc arena lock — small-buffer
allocations going through a global heap lock.
Where the cost lives
op_encodealways allocates a fresh shared backing store viav8::ArrayBuffer::new_backing_store_from_veclibs/core/ops_builtin_v8.rs:517-531TextEncoder.encode("…")(and everycore.encode(…)call from fetch body/formdata/22_body.js) ends here. Backing store creation goes through V8'sArrayBuffer::Allocator, which in rusty-v8 cannot use the on-heap-TypedArray fast path (v8_typed_array_max_size_in_heap = 0inrusty-v8-147.4.0/.gn).new TextDecoder()op-dispatches even for the default"utf-8"caseext/web/08_text_encoding.js:85,:213op_encoding_normalize_label(label); non-utf8 case additionally callsop_encoding_new_decoderto obtain a cppgc resource. The default UTF-8 path also goes through normalize-label.decode(chunk, {stream:true})op-dispatches per chunk and returns aU16String(UTF-16 vec)ext/web/08_text_encoding.js:219,ext/web/lib.rs:631-669vec![0; max_utf16_buffer_length]for the output, then marshals asU16Stringback across the boundary. The static (non-stream) UTF-8 path has thesimdutfASCII fast path; the streaming/non-UTF-8 path does not.libs/core/ops_builtin_v8.rs:526-528op_encoding_decode_utf8has an excellent SIMD ASCII fast path — keep thisext/web/lib.rs:518-535decode(1 MB ASCII)is 6.5× faster than Node. Documented here as a positive architectural choice, not a finding to fix.Ranked architectural hypotheses
H1 —
TextEncoder.encodeallocates a fresh malloc-backedArrayBufferon every call; on-heap-TypedArray fast path is disabled (HIGH × HIGH)encode("hello world!")(12 B) is 1 324 ns in Deno vs65 ns in Bun — 20× gap. V8 prof: ~32 % of nonlib ticks in
__libc_malloc+__libc_realloc, plus 12 % in malloc arena locks.libs/core/ops_builtin_v8.rs:517-531always calls
v8::ArrayBuffer::new_backing_store_from_vec(bytes).make_shared(),which in rusty-v8 goes through the libc allocator (V8's on-heap
TypedArray path is disabled at
v8_typed_array_max_size_in_heap = 0inthe rusty-v8
.gn, to preserve embedder pointer stability for op2's#[buffer]ABI).op2
#[buffer]callsites for NoAllocScope. Drops the 20× small-encodegap to near-zero.
embedder
ArrayBuffer::Allocator. Less invasive than (A) but doesn'tsolve the cppgc-pointer-stability concern fully.
Either way: closes the encode gap, drops libc time below 5 %. Cross-cutting
with
perf-research/fetch(every server response body),perf-research/streams,perf-research/structuredClone, and several other surfaces — this is thesingle highest-leverage finding across the whole research effort.
H2 —
new TextDecoder()op-dispatches even for the default UTF-8 case (MEDIUM × HIGH)new TextDecoder()is 187 ns in Deno vs 75 ns in Node and7 ns in Bun (28× Bun, 2.5× Node).
ext/web/08_text_encoding.js:85callsop_encoding_normalize_label(label)unconditionally; later code (line213) sets up a cppgc resource only for non-UTF-8 cases. The default
UTF-8 path still pays the label-normalize op. Bun's TextDecoder
presumably elides this entirely on the default-utf8 path.
common case of
new TextDecoder()(no args) andnew TextDecoder("utf-8").Brings construct to ~10 ns (just the brand assignment). Affects every
fetch handler that constructs a TextDecoder per request, every streaming
reader, etc. Not a leading cost individually but very common.
H3 —
decode(chunk, {stream:true})allocatesvec![0; max_utf16_len]per chunk and returns marshaledU16String(MEDIUM × HIGH)decode_stream_5chunksis 2.06× Node and 3.92× Bun.ext/web/lib.rs:631-669pre-allocates a UTF-16 vector sized at
max_utf16_buffer_length(data.len())then truncates. Returns
U16Stringwhich is then marshaled back as aV8 string. Plus the cppgc resource borrow on every call.
for HTTP body streaming), follow the simdutf ASCII fast path that the
non-stream UTF-8 path already has — write straight to a V8 one-byte
string and skip the UTF-16 vector. Brings stream decode in line with
non-stream decode (~118 ns/chunk instead of 510 ns/chunk in this
bench). For binary or non-ASCII streams, the existing path is fine.
Affects any code that uses
response.body+TextDecoderStream(avery common SSE / line-streamed-JSON pattern).
H4 —
op_encoding_encode_intoalready has a fast path; encode does not (MEDIUM × MEDIUM)encodeInto(small, dest)is 40 ns in Deno (faster thanNode's 53 ns, matches Bun's 42 ns). But
encode(small)is 1 512 ns inDeno (20-30× the encodeInto cost on the same input).
encodeIntowrites to a caller-provided buffer(
ext/web/lib.rs:686-704) — no allocation,no backing-store creation.
encodealways allocates. The asymmetryis large.
on-heap path. A workaround would be to expose a fast path in
op_encodethat uses the same write-into-caller-buffer pattern whenthe encoded size is bounded (e.g. ≤256 bytes) — encode into a stack
buffer first, then a single ArrayBuffer creation with the exact size.
Listed separately from H1 because the encode-into infrastructure already
exists and could be reused.
Non-finding —
decode(1 MB ASCII)is 6.5× faster than Node (architectural win)Documented here so the team doesn't accidentally undo it:
ext/web/lib.rs:528-535and the matchinglibs/core/ops_builtin_v8.rs:545-551both short-circuit through
v8::simdutf::validate_ascii→v8::String::new_from_one_byte. Pure ASCII (the dominant real-worldshape: HTTP/JSON bodies, file reads, console output) skips V8's internal
UTF-8 validation entirely. This single decision puts Deno's bulk-ASCII
decode well ahead of every other runtime measured.
Reproduction
See
tools/perf_research/text-encoding/README.md.