perf research: url#2
Draft
crowlbot wants to merge 1 commit into
Draft
Conversation
- micro/url_micro.js: 14 ops covering construct/canParse/setters/searchParams on URL, and construct/get/has/toString on URLSearchParams. - micro_results.jsonl: per-runtime ns/op from a single sweep across Deno 2.7.14, Node v22.22.2, Bun 1.3.14. - profiles/url_micro.prof.txt + .v8.log.gz: V8 --prof in-process profile of the bench (perf/samply blocked by container caps; --prof always works).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
perf research: url
Macro performance research on Deno's implementation of
URLandURLSearchParams.This PR contains only benchmark scripts and committed V8 prof artifacts — no
production code changes. The report below is the deliverable.
Methodology
ns/op is unreliable, so the headline is same-host ratios vs Node 22 LTS
and Bun.
--profin-process.perf/samplyneed
kernel.perf_event_paranoid<=1but the container is locked at3and
sysctlis denied — so all attribution below is JS-side. Nativeservo/url crate time is captured under the "deno binary" bucket but
cannot be broken down further without
CAP_PERFMON.hit: construct (with and without base),
canParse, getters(
href/pathname/search), setters (pathname/search),searchParams.get, plus URLSearchParams construct (string + object),get, and
toString.Pinned:
deno 2.7.14 (v8 14.7.173.20-rusty),node v22.22.2,bun 1.3.14.Headline ratios — microbench (ns/op; lower is better)
URL
new URL("https://example.com/path?x=1#y")new URL("/p?x=1", base)new URL("…/api?a=1&b=2&c=3&d=4&e=5")URL.canParse("https://example.com/path")url.hrefurl.pathnameurl.searchurl.pathname = "/new/path"url.search = "?new=query"url.searchParams.get("a")URLSearchParams
new URLSearchParams("a=1&b=2&…&h=8")(8 pairs)new URLSearchParams(obj)(6 pairs)usp.get("c")usp.toString()(8 pairs)Reads:
new URL(...)is within ±25 %of Node and slightly faster than Bun on the simple case. Servo's
urlcrate is doing fine.
url.pathnameis ~12 ns vs Bun's 53 ns,because the offset-into-serialization design avoids reparsing on read.
url.pathname = "/x"is 769 ns,competitive with Node (694) but still ~50× more expensive than the
corresponding getter. Each call op-dispatches into Rust and re-parses
the URL from its serialized string.
URLSearchParams.toString()is 5.7× slower than Node and 3.9× slowerthan Bun. The op-dispatch overhead dwarfs the actual serialization for
small param sets (8 pairs).
URLSearchParams.get()is 2.5× slower than Node and 1.8× slower thanBun. Storage is
[[name, value], …]walked with===on everycall — no index, no hash.
new URLSearchParams("a=1&b=2&…")is 6.8× slower than Node. Nodeparses pure-JS without an op call; Deno op-dispatches into Rust's
form_urlencoded::parse. For a 24-byte query string, the dispatchdominates the parse.
Flamegraph attribution (V8
--prof)Full profile committed at
tools/perf_research/url/profiles/url_micro.prof.txt(raw log:
url_micro.v8.log.gz).Top of
Statistical profiling result(1 559 ticks):77.5 % of total ticks land inside the Deno binary — i.e. servo's
urlcrate and the op-dispatch boundary. JS-side time is dominated by the
record<USVString, USVString>converter (USP{...}init) and theUSVString well-formed pass.
Where the cost lives
ext/web/url.rs:127-191op_url_reparsecallsUrl::options().parse(&href)before applying the setter, even though the JS side could pass through component offsets for in-place mutation.URLSearchParams.toString()op-dispatches to Rust for serializationext/web/00_url.js:332-335↔ext/web/url.rs:213-221toString(). For 8-pair input the op dispatch is ≥80 % of the cost; Node does this in pure JS.new URLSearchParams("…")op-dispatches toform_urlencoded::parseext/web/00_url.js:142↔ext/web/url.rs:195-211[[name, value], …]with linear-scan.get/.has/.setext/web/00_url.js:243-256(get),:263-276(has),:282-318(set)Headers: O(n) scan,===compare. Acceptable for the typical small N but explains the.get2.5× gap vs Node.op_url_parse+op_url_get_serializationis a two-op sequence on the "needs serialization" pathext/web/url.rs:38-40,ext/web/00_url.js:100-110op_url_parse_search_paramseagerly collects intoVec<(String, String)>ext/web/url.rs:198-210Strings for an N-pair query, even when the caller only wantssearchParams.get("first").URL.canParsedoes a full parse rather than a validation passext/web/00_url.js:446-455↔op_url_parseurlcrate doesn't expose a validation-only entry point, socanParsepays the full parse + components-buf write.Ranked architectural hypotheses
H1 — Every URL setter (
pathname,search,hash,host, …) re-parses the entire URL string in Rust on every call (HIGH × HIGH)url.pathname = "/x"is 769 ns vsurl.pathnamegetter at 13 ns — a 60× spread.Every setter in
ext/web/00_url.js:526-882callsopUrlReparse(this.#serialization, SET_*, value), which inext/web/url.rs:127-191doesUrl::options().parse(&href)from scratch before applying the setter.cached in private fields, but the Rust side doesn't keep a
Urlaroundper URL instance — every setter rebuilds the parse state from the
serialized string. The component buffer is updated, but the Rust state
is discarded.
Url-resource model (op returns anrid, setters operate on the rid) would amortize the parse cost across
the lifetime of the URL. For multi-setter code paths
(
url.pathname = …; url.search = …; url.hash = …) this is a 3× reduction;for single-setter use it shaves the ~700 ns reparse to ~200 ns
(apply + reserialize). Most server-side routing libraries set
pathname/searchper request — this is a hot path.H2 —
URLSearchParams.toString()op-dispatches to Rust for what can be a 5-line JS loop (MEDIUM × HIGH)usp.toString()(8 pairs) is 1 990 ns in Deno vs 351 nsin Node and 518 ns in Bun — 5.7× and 3.9× slower respectively.
The op call (
ext/web/url.rs:213-221)takes the full
Vec<(String, String)>, runsform_urlencoded::Serializer::new(...).extend_pairs(...).finish(),and returns the resulting
String— a full crossing of the V8/Rustboundary plus a
Vec<(String, String)>allocation in the marshalinglayer.
every real use), the op-dispatch and
Vec<(String,String)>allocationdominate the actual serialization. A pure-JS encoder (Node's path)
ends up faster.
common case (escape table + simple loop, ~30 lines). Brings
toString()in line with Node (~350 ns), a 5× win on this op. Affects any
code that calls
usp.toString()for redirect URLs, query rebuilding,fetch URL construction, etc.
H3 —
new URLSearchParams("…")op-dispatches toform_urlencoded::parseeven for short queries (MEDIUM × HIGH)new URLSearchParams("a=1&b=2&…&h=8")(8 pairs, 24 bytes)is 1 483 ns in Deno vs 219 ns in Node — 6.8× slower.
ext/web/00_url.js:142callsop_url_parse_search_params(init); the op eagerly collects intoVec<(String, String)>(url.rs:198-210)and returns it. The marshaling cost outweighs the actual parsing for
short inputs.
threshold (e.g. ≤256 bytes). Brings construct in line with Node
(~220 ns), a 6× win. Note that this is also the path used by
url.searchParams = ...and by thesearchParamsgetter on firstaccess, so it compounds.
H4 —
op_url_parseplusop_url_get_serializationis a two-op call on the "needs serialization" path (MEDIUM × MEDIUM)serialization differs from the input — i.e. any URL with casing
normalization (
HTTPS://X→https://x), percent-encoding, defaultport removal (
:443/:80), or trailing-slash insertion. Real-worldURLs hit this constantly.
ext/web/url.rs:96-102stashes the serialized String into
OpStateand returns status=1;the JS then calls
op_url_get_serialization()as a second op(
ext/web/00_url.js:100-110) to takeit out. This avoids returning a String on the status=0 fast path, but
doubles the boundary crossings on the slow path.
Option<String>(or a sentinel value in a #[buffer] arg) on the slowpath — one op call instead of two. Estimated 80-150 ns savings on
the slow path per URL, which fires on the majority of real-world
inputs.
H5 — URL parsing is a hot path for
fetch(URL is parsed at least twice per request) (MEDIUM × MEDIUM)ext/fetch/lib.rs:436parsesthe URL string again inside
op_fetch, even though the JS side hasalready constructed a
URLvia the user'snew URL(...)call (orpassed a string that fetch internally wraps).
Rust op (only its
.hrefstring crosses the boundary). The fetchop accepts a
Stringand doesUrl::parse(&url)from scratch.Rust
Urlrid (per H1), fetch could consume it directly without asecond parse. ~400 ns savings per
fetch(url)call, compounding withthe H1 setter savings for
fetch(new URL(...))patterns. Listedseparately from H1 because it spans
ext/webandext/fetch.H6 —
URLSearchParamslinear-scan.get/.has/.set(same pattern asHeaders) (LOW × HIGH)usp.get("c")is 2.5× slower than Node, 1.8× slowerthan Bun. Storage is
[[name, value], …]walked with===per entry. Typical real-world N is small (≤8), so this is
individually small, but it's present at every searchParams access.
ext/web/00_url.js:243-256.No index, no hash. Spec compliance requires preserving insertion
order; an additional
Map<name, indexInList>index would speed uplookups while keeping the list as the source of truth.
more than the asymptotic complexity. Single-digit-percent for typical
N; included here only because the same pattern is the leading finding
in
perf-research/fetchforHeaders, and a single sharedimplementation strategy would address both.
Reproduction
See
tools/perf_research/url/README.md.