Skip to content

implement v8::String and v8::String::Utf8Value in Rust#6364

Merged
anonrig merged 3 commits intomainfrom
yagiz/add-rust-string-utf8value
Mar 23, 2026
Merged

implement v8::String and v8::String::Utf8Value in Rust#6364
anonrig merged 3 commits intomainfrom
yagiz/add-rust-string-utf8value

Conversation

@anonrig
Copy link
Member

@anonrig anonrig commented Mar 19, 2026

Adds all necessary methods to use v8::String and v8::Utf8Value in Rust

@anonrig anonrig requested review from a team as code owners March 19, 2026 19:32
@anonrig anonrig requested review from guybedford, jasnell and mikea March 19, 2026 19:33
Copy link
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

This PR adds Rust bindings for v8::String and v8::String::Utf8Value, including FFI functions, safe Rust wrappers, and comprehensive tests.

Issues found (ranked by severity)

  1. NewStringType parameter is silently ignored — The Rust API accepts NewStringType::Internalized but the C++ FFI always hardcodes v8::NewStringType::kNormal. A caller passing Internalized gets Normal behavior with no warning. Either pass the enum through to C++ or remove the parameter and document the limitation.

  2. BitOr for WriteFlags returns i32, not WriteFlagsWriteFlags::NullTerminate | WriteFlags::ReplaceInvalidUtf8 produces an i32, which can't be passed to write(), write_one_byte(), or write_utf8() since they accept WriteFlags. The combined variant NullTerminateAndReplaceInvalidUtf8 works around this, but the BitOr impl is misleading since its result is unusable with the API.

  3. to_rust_string always double-allocatesfrom_utf8_lossy(&buf).into_owned() allocates a second String even when the bytes are already valid UTF-8 (the common case). Can avoid this with String::from_utf8(buf).unwrap_or_else(|e| String::from_utf8_lossy(e.as_bytes()).into_owned()).

  4. Duplicate function declarations in ffi-inl.h — The four utf8_value_* function declarations at ffi-inl.h:67-70 are identical to those in ffi.h:168-171. Since ffi-inl.h already includes ffi.h, the redeclarations are redundant.

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 19, 2026

Review posted successfully on PR #6364 with 4 inline comments covering:

  1. NewStringType silently ignored (medium) -- C++ FFI hardcodes kNormal, making Internalized a no-op
  2. BitOr<WriteFlags> returns i32 (medium) -- combined flags can't be passed to write methods
  3. to_rust_string double-allocation (low) -- from_utf8_lossy().into_owned() always allocates even for valid UTF-8
  4. Duplicate declarations in ffi-inl.h (low) -- redundant re-declarations already present in ffi.h

No backward compatibility issues, security concerns, or memory safety problems were found. The wrappable.rs changes are straightforward renames from String to std::string::String to resolve the name collision with the new v8::String marker type. Test coverage is thorough with 30+ tests.

github run

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 19, 2026

Merging this PR will degrade performance by 26.55%

❌ 2 regressed benchmarks
✅ 68 untouched benchmarks
⏩ 129 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
simpleStringBody[Response] 20.1 µs 27.4 µs -26.55%
Encode_ASCII_256[TextEncoder][0/0/256] 3.1 ms 3.5 ms -11.09%

Comparing yagiz/add-rust-string-utf8value (bf864c7) with main (b4fc0eb)

Open in CodSpeed

Footnotes

  1. 129 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@anonrig anonrig force-pushed the yagiz/add-rust-string-utf8value branch from 48d310c to b02544b Compare March 19, 2026 20:50
@anonrig
Copy link
Member Author

anonrig commented Mar 19, 2026

/bigbonk roast this pr

Copy link
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review — my four prior findings have all been addressed, thank you. This pass focuses on handle lifetime safety in the new MaybeLocal type and a minor soundness edge case in Utf8Value.

New findings

  1. MaybeLocal<T> leaks V8 handle slots when dropped without being unwrapped (high) — no Drop impl, so non-empty handles are silently leaked on error/early-return paths.
  2. MaybeLocal::to_local(&self) aliases the handle without cloning (medium) — both the MaybeLocal and the returned Local reference the same slot; this becomes a double-free if Drop is added, or a use-after-free if the Local is dropped first and the MaybeLocal is later unwrapped.
  3. Utf8Value::as_bytes is unsound on null data pointer (low) — V8 returns null from operator*() when conversion fails; slice::from_raw_parts(null, 0) is UB even for zero length.
  4. Two of @jasnell's documentation requests still appear unaddressed.

This review was generated by an AI assistant (ask-bonk). Findings may contain inaccuracies — please verify before acting on them.

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 19, 2026

Review posted successfully on PR #6364 with 5 inline comments covering:

  1. MaybeLocal<T> missing Drop impl (high) — Non-empty handles leak V8 handle slots when dropped without being unwrapped. Every other handle wrapper (Local, Global, Utf8Value) implements Drop but MaybeLocal does not. Suggested a Drop impl.

  2. MaybeLocal::to_local(&self) handle aliasing (medium) — Takes &self and copies the raw pointer into a new Local without calling local_clone, creating two owners of the same handle slot. This is a double-free once Drop is added, or a use-after-free in certain sequences. Suggested either cloning the handle or making to_local consume self.

  3. Utf8Value::as_bytes UB on null pointer (low) — V8 returns null from operator*() when value conversion fails (e.g., Symbol). slice::from_raw_parts(null, 0) is UB in Rust even for zero-length slices. Suggested a null guard.

  4. jasnell's Utf8Value documentation request — Suggested clearer wording about the copy semantics at ffi.h:171-173.

  5. jasnell's IsOneByte false negatives request — Suggested adding explicit "false negatives are possible" phrasing to the comment at ffi.c++:226-231.

All four of my prior findings (NewStringType ignored, BitOr returning i32, to_rust_string double-allocation, duplicate ffi-inl.h declarations) have been addressed in this revision.

github run

@anonrig anonrig force-pushed the yagiz/add-rust-string-utf8value branch from 9cde9d4 to 1bf7016 Compare March 20, 2026 16:44
@anonrig anonrig requested a review from jasnell March 20, 2026 16:48
@anonrig anonrig force-pushed the yagiz/add-rust-string-utf8value branch 2 times, most recently from a1208cb to 1ff585d Compare March 20, 2026 19:36
Copy link
Collaborator

@jasnell jasnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few items that I don't believe have been flagged yet in prior review rounds:

1. WriteFlags::BitOr transmute produces UB (HIGH)

The original BitOr returning i32 (flagged by Bonk round 1) was fixed by making it return WriteFlags — but the fix uses transmute(self.bits() | rhs.bits()) which can produce value 3, an invalid enum discriminant. Per the Rust Reference: "It is undefined behavior to create an enum value with a discriminant that does not match any of its variants." This applies even to #[repr(i32)] enums. Miri would flag this.

Two clean fixes:

  • Add a variant: NullTerminateAndReplaceInvalidUtf8 = 3
  • Newtype bitflags (more idiomatic): replace the enum with struct WriteFlags(i32) + associated constants

(Comment posted on the transmute line below.)

2. write() / write_one_byte() buffer overflow check is debug_assert only (MEDIUM)

These methods use debug_assert!(buffer.len() >= length as usize) which is stripped in release builds. If length > buffer.len(), V8 writes past the buffer — heap buffer overflow. write_utf8 correctly uses buffer.len() as capacity. Either clamp length to buffer.len() or promote to a hard assert!.

3. Utf8Value::new ownership semantics (MEDIUM)

Utf8Value::new takes Local<'_, Value> by value, consuming it. In C++, v8::String::Utf8Value's constructor takes a Local<Value> by value which is a cheap copy. The Rust API forces callers to .clone() the Local if they want to keep using it. Consider taking &Local instead and cloning internally for the FFI call.

4. to_local "clone" comment is misleading (LOW)

The comment on to_local says "Clone the handle so self and the returned Local own independent handle slots" — but local_clone is a bitwise pointer copy (Local{.ptr = value.ptr}). V8 Local handles are non-owning references into the HandleScope — there are no independent "handle slots" being created. The comment should reflect that this is a pointer copy, not a deep clone.

5. Missing copyright header on string.rs (LOW)

The new test file doesn't have the project copyright/license header.

jasnell

This comment was marked as duplicate.

anonrig added 2 commits March 23, 2026 11:09
- Add Drop impl for MaybeLocal<T>: non-empty handles were silently
  leaked when dropped without being unwrapped
- Fix MaybeLocal::to_local(&self) handle aliasing: clone the handle
  via local_clone so MaybeLocal and the returned Local own independent
  slots (without this, adding Drop would cause a double-free)
- Optimise consuming paths (into_option, unwrap, unwrap_or) to
  transfer ownership via ManuallyDrop rather than cloning
- Fix Utf8Value::as_bytes UB: slice::from_raw_parts with a null
  pointer is UB even for zero-length slices; guard with a null check
- Expand Utf8Value doc comments (struct, new, length, as_ptr) to
  document copy semantics, data lifetime, and null-pointer behaviour
  on unconvertible values (e.g. Symbol)
- Note that IsOneByte false negatives are possible per V8 API
@anonrig anonrig force-pushed the yagiz/add-rust-string-utf8value branch from 1ff585d to c6aeee0 Compare March 23, 2026 15:40
@anonrig
Copy link
Member Author

anonrig commented Mar 23, 2026

/bigbonk roast this pr

Copy link
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review on revision c6aeee0. All 15 findings from prior rounds (my two reviews, plus @jasnell's and @guybedford's feedback) have been addressed, with two exceptions and one new finding.

New finding

  1. Hash/Eq contract violation on Local<String> (HIGH) — Eq compares by content (StringEquals) but Hash uses GetIdentityHash() which is identity-based. Two non-internalized strings with equal content can have different identity hashes, violating Rust's requirement that a == b implies hash(a) == hash(b). Using Local<String> as a HashMap key will produce silent data corruption.

Still open from prior rounds

  1. bool internalized FFI parameter (MEDIUM) — @jasnell asked for an enum-like type instead of the bare bool. CXX supports shared enums across the bridge, so this is feasible. Still a bool in the current revision.
  2. WriteFlags transmute (LOW) — The NullTerminateAndReplaceInvalidUtf8 = 3 variant makes the transmute sound, but both @jasnell and @guybedford suggested the newtype pattern (struct WriteFlags(i32) + associated constants) as more idiomatic Rust. The enum+transmute approach works but is unusual.

This review was generated by an AI assistant (ask-bonk). Findings may contain inaccuracies — please verify before acting on them.

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 23, 2026

Review posted successfully on PR #6364 with 3 inline comments covering:

  1. Hash/Eq contract violation on Local<String> (HIGH) — The Hash impl uses GetIdentityHash() (identity-based) while Eq uses StringEquals() (content-based). This violates Rust's fundamental Hash/Eq invariant and will silently corrupt HashMap/HashSet lookups. Recommended removing the Hash impl or computing hash from content.

  2. bool internalized FFI parameter still unresolved (MEDIUM) — jasnell's request for an enum-like type instead of bare bool across the CXX bridge. CXX supports shared enums, making this feasible.

  3. WriteFlags transmute vs newtype (LOW) — The NullTerminateAndReplaceInvalidUtf8 = 3 variant makes the transmute sound, resolving the UB. Noted that both jasnell and guybedford suggested the newtype pattern as more idiomatic.

All 15 findings from prior review rounds have been addressed in this revision.

github run

@anonrig anonrig force-pushed the yagiz/add-rust-string-utf8value branch from c6aeee0 to 7a5d930 Compare March 23, 2026 16:21
- String::Concat — concatenates two v8::String handles
- String::InternalizeString — interns a string into V8's string table
- String::GetIdentityHash — returns the stable non-zero identity hash
- String::IsFlat — checks flat internal representation (via workerd V8 patch 0029)
- Local<String>::MAX_LENGTH — mirrors v8::String::kMaxLength
@anonrig anonrig force-pushed the yagiz/add-rust-string-utf8value branch from 7a5d930 to bf864c7 Compare March 23, 2026 16:24
@anonrig anonrig enabled auto-merge (squash) March 23, 2026 16:24
@anonrig anonrig merged commit d62d62a into main Mar 23, 2026
22 checks passed
@anonrig anonrig deleted the yagiz/add-rust-string-utf8value branch March 23, 2026 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants