Skip to content

perf research: crypto.subtle + getRandomValues#6

Draft
crowlbot wants to merge 1 commit into
mainfrom
perf-research/crypto
Draft

perf research: crypto.subtle + getRandomValues#6
crowlbot wants to merge 1 commit into
mainfrom
perf-research/crypto

Conversation

@crowlbot
Copy link
Copy Markdown
Owner

TL;DR

crypto.subtle.digest("SHA-256", smallBuf) is 7.8× slower than Bun (37 μs vs 4.8 μs). Cause: op_crypto_subtle_digest in ext/crypto/lib.rs calls spawn_blocking for every input, regardless of size. For an 11-byte message, the actual SHA-256 work is <100 ns but the thread-pool dispatch adds ~30 μs. Graduating to an upstream fix that runs the digest synchronously on the calling thread for small inputs.

Strong positive (worth preserving): crypto.getRandomValues(buf16) is 43× faster than Node 22 LTS. Deno's sync getrandom path is excellent.

Headline ratios

Bench Deno Node 22 Bun Deno vs Node Deno vs Bun
digest_sha256_short 37,000 42,500 4,800 1.15× faster 7.77× slower
digest_sha1_short 30,500 41,500 3,900 1.36× faster 7.90× slower
digest_sha256_64k 286,000 242,000 266,000 1.18× slower match
import_hmac_raw 17,500 35,500 3,700 2× faster 4.73× slower
getrandom_16 135 5,853 77 43.5× FASTER 1.74× slower
getrandom_32 164 5,924 81 36.2× FASTER 2.03× slower
getrandom_16k 10,300 11,000 5,740 match 1.79× slower
randomUUID 297 256 200 1.16× slower 1.48× slower
aes_gcm_encrypt_256 52,900 56,900 32,600 match 1.62× slower
aes_gcm_decrypt_256 52,800 58,700 41,700 match 1.27× slower
hmac_sign_reused 45,055 44,578 33,057 match 1.36× slower
pbkdf2_sha256_10k 6,458,332 10,685,036 6,806,746 1.65× faster match

Times in ns/op (microsecond × 1000). Full data in profiles/crypto_all.log.

Where the cost lives — digest

ext/crypto/lib.rs:881-894:

```rust
#[op2]
pub async fn op_crypto_subtle_digest(
#[serde] algorithm: CryptoHash,
#[buffer] data: JsBuffer,
) -> Result<Uint8Array, CryptoError> {
let output = spawn_blocking(move || {
digest::digest(algorithm.into(), &data)
.as_ref()
.to_vec()
.into()
})
.await?;
Ok(output)
}
```

spawn_blocking dispatches to tokio's blocking-thread pool. For inputs large enough to actually consume CPU time the dispatch is appropriate. For small inputs it inverts — dispatch costs ~30 μs while a single SHA-256 block on hardware-accelerated x86 takes well under 100 ns.

V8 prof attribution (200k iters of digest("SHA-256", "hello world"))

  • ~31 % of ticks in unattributed Deno binary (spawn_blocking + digest). Native flamegraphs blocked on this host (perf_event_paranoid=4, no sudo).
  • ~5 % in libc malloc.
  • Small but consistent ticks in async/promise plumbing (AsyncFunctionAwaitResolveClosure, ResumeGeneratorTrampoline).

Ranked hypotheses

Rank Hypothesis Impact × Confidence Notes
H1 op_crypto_subtle_digest uses spawn_blocking for every input. Short-circuiting for small inputs (≤ 64 KB) recovers ~30 μs per call. HIGH × HIGH Bun does this. Graduating to upstream PR perf/crypto-digest-sync-small-inputs.
H2 importKey HMAC raw is 4.7× slower than Bun. medium × medium Likely webidl + key-data conversion. Not attributed to a single architectural cost. Not graduating this tick.
H3 digest SHA-1 short is the same 7.9× story. subsumed by H1 Same fix applies.
H4 getRandomValues(16) 43× faster than Node positive Worth protecting.

What's not here

Layout

```
tools/perf_research/crypto/
README.md full report
micro/crypto_micro.js 13 ops covering getrandom, digest, hmac, aes-gcm, pbkdf2
profiles/crypto_all.log raw bench output per runtime
profiles/digest.prof.txt V8 --prof for the digest hot path
profiles/versions.txt runtime versions + host caps
```

Headline: crypto.subtle.digest of a small message is 7.8x slower than
Bun (37 us vs 4.8 us) and roughly the same as Node 22. Cause is
op_crypto_subtle_digest in ext/crypto/lib.rs calling spawn_blocking on
every call. For an 11-byte SHA-256 the actual work is <100 ns but
thread-pool dispatch adds ~30 us. Fix is a sync small-input fast path.

Strong positive: getRandomValues(16) is 43x faster than Node LTS.
Worth preserving.

Other ratios for HMAC, AES-GCM, PBKDF2 are competitive with both
Node and Bun. Graduating only the digest finding to upstream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant