implement v8::String and v8::String::Utf8Value in Rust#6364
Conversation
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
This PR adds Rust bindings for v8::String and v8::String::Utf8Value, including FFI functions, safe Rust wrappers, and comprehensive tests.
Issues found (ranked by severity)
-
NewStringTypeparameter is silently ignored — The Rust API acceptsNewStringType::Internalizedbut the C++ FFI always hardcodesv8::NewStringType::kNormal. A caller passingInternalizedgetsNormalbehavior with no warning. Either pass the enum through to C++ or remove the parameter and document the limitation. -
BitOrforWriteFlagsreturnsi32, notWriteFlags—WriteFlags::NullTerminate | WriteFlags::ReplaceInvalidUtf8produces ani32, which can't be passed towrite(),write_one_byte(), orwrite_utf8()since they acceptWriteFlags. The combined variantNullTerminateAndReplaceInvalidUtf8works around this, but theBitOrimpl is misleading since its result is unusable with the API. -
to_rust_stringalways double-allocates —from_utf8_lossy(&buf).into_owned()allocates a secondStringeven when the bytes are already valid UTF-8 (the common case). Can avoid this withString::from_utf8(buf).unwrap_or_else(|e| String::from_utf8_lossy(e.as_bytes()).into_owned()). -
Duplicate function declarations in
ffi-inl.h— The fourutf8_value_*function declarations atffi-inl.h:67-70are identical to those inffi.h:168-171. Sinceffi-inl.halready includesffi.h, the redeclarations are redundant.
|
Review posted successfully on PR #6364 with 4 inline comments covering:
No backward compatibility issues, security concerns, or memory safety problems were found. The |
Merging this PR will degrade performance by 26.55%
Performance Changes
Comparing Footnotes
|
48d310c to
b02544b
Compare
|
/bigbonk roast this pr |
There was a problem hiding this comment.
Follow-up review — my four prior findings have all been addressed, thank you. This pass focuses on handle lifetime safety in the new MaybeLocal type and a minor soundness edge case in Utf8Value.
New findings
MaybeLocal<T>leaks V8 handle slots when dropped without being unwrapped (high) — noDropimpl, so non-empty handles are silently leaked on error/early-return paths.MaybeLocal::to_local(&self)aliases the handle without cloning (medium) — both theMaybeLocaland the returnedLocalreference the same slot; this becomes a double-free ifDropis added, or a use-after-free if theLocalis dropped first and theMaybeLocalis later unwrapped.Utf8Value::as_bytesis unsound on null data pointer (low) — V8 returns null fromoperator*()when conversion fails;slice::from_raw_parts(null, 0)is UB even for zero length.- Two of @jasnell's documentation requests still appear unaddressed.
This review was generated by an AI assistant (ask-bonk). Findings may contain inaccuracies — please verify before acting on them.
|
Review posted successfully on PR #6364 with 5 inline comments covering:
All four of my prior findings (NewStringType ignored, BitOr returning i32, to_rust_string double-allocation, duplicate ffi-inl.h declarations) have been addressed in this revision. |
9cde9d4 to
1bf7016
Compare
a1208cb to
1ff585d
Compare
jasnell
left a comment
There was a problem hiding this comment.
A few items that I don't believe have been flagged yet in prior review rounds:
1. WriteFlags::BitOr transmute produces UB (HIGH)
The original BitOr returning i32 (flagged by Bonk round 1) was fixed by making it return WriteFlags — but the fix uses transmute(self.bits() | rhs.bits()) which can produce value 3, an invalid enum discriminant. Per the Rust Reference: "It is undefined behavior to create an enum value with a discriminant that does not match any of its variants." This applies even to #[repr(i32)] enums. Miri would flag this.
Two clean fixes:
- Add a variant:
NullTerminateAndReplaceInvalidUtf8 = 3 - Newtype bitflags (more idiomatic): replace the enum with
struct WriteFlags(i32)+ associated constants
(Comment posted on the transmute line below.)
2. write() / write_one_byte() buffer overflow check is debug_assert only (MEDIUM)
These methods use debug_assert!(buffer.len() >= length as usize) which is stripped in release builds. If length > buffer.len(), V8 writes past the buffer — heap buffer overflow. write_utf8 correctly uses buffer.len() as capacity. Either clamp length to buffer.len() or promote to a hard assert!.
3. Utf8Value::new ownership semantics (MEDIUM)
Utf8Value::new takes Local<'_, Value> by value, consuming it. In C++, v8::String::Utf8Value's constructor takes a Local<Value> by value which is a cheap copy. The Rust API forces callers to .clone() the Local if they want to keep using it. Consider taking &Local instead and cloning internally for the FFI call.
4. to_local "clone" comment is misleading (LOW)
The comment on to_local says "Clone the handle so self and the returned Local own independent handle slots" — but local_clone is a bitwise pointer copy (Local{.ptr = value.ptr}). V8 Local handles are non-owning references into the HandleScope — there are no independent "handle slots" being created. The comment should reflect that this is a pointer copy, not a deep clone.
5. Missing copyright header on string.rs (LOW)
The new test file doesn't have the project copyright/license header.
- Add Drop impl for MaybeLocal<T>: non-empty handles were silently leaked when dropped without being unwrapped - Fix MaybeLocal::to_local(&self) handle aliasing: clone the handle via local_clone so MaybeLocal and the returned Local own independent slots (without this, adding Drop would cause a double-free) - Optimise consuming paths (into_option, unwrap, unwrap_or) to transfer ownership via ManuallyDrop rather than cloning - Fix Utf8Value::as_bytes UB: slice::from_raw_parts with a null pointer is UB even for zero-length slices; guard with a null check - Expand Utf8Value doc comments (struct, new, length, as_ptr) to document copy semantics, data lifetime, and null-pointer behaviour on unconvertible values (e.g. Symbol) - Note that IsOneByte false negatives are possible per V8 API
1ff585d to
c6aeee0
Compare
|
/bigbonk roast this pr |
There was a problem hiding this comment.
Follow-up review on revision c6aeee0. All 15 findings from prior rounds (my two reviews, plus @jasnell's and @guybedford's feedback) have been addressed, with two exceptions and one new finding.
New finding
Hash/Eqcontract violation onLocal<String>(HIGH) —Eqcompares by content (StringEquals) butHashusesGetIdentityHash()which is identity-based. Two non-internalized strings with equal content can have different identity hashes, violating Rust's requirement thata == bimplieshash(a) == hash(b). UsingLocal<String>as aHashMapkey will produce silent data corruption.
Still open from prior rounds
bool internalizedFFI parameter (MEDIUM) — @jasnell asked for an enum-like type instead of the barebool. CXX supports shared enums across the bridge, so this is feasible. Still aboolin the current revision.WriteFlagstransmute (LOW) — TheNullTerminateAndReplaceInvalidUtf8 = 3variant makes the transmute sound, but both @jasnell and @guybedford suggested the newtype pattern (struct WriteFlags(i32)+ associated constants) as more idiomatic Rust. The enum+transmute approach works but is unusual.
This review was generated by an AI assistant (ask-bonk). Findings may contain inaccuracies — please verify before acting on them.
|
Review posted successfully on PR #6364 with 3 inline comments covering:
All 15 findings from prior review rounds have been addressed in this revision. |
c6aeee0 to
7a5d930
Compare
- String::Concat — concatenates two v8::String handles - String::InternalizeString — interns a string into V8's string table - String::GetIdentityHash — returns the stable non-zero identity hash - String::IsFlat — checks flat internal representation (via workerd V8 patch 0029) - Local<String>::MAX_LENGTH — mirrors v8::String::kMaxLength
7a5d930 to
bf864c7
Compare
Adds all necessary methods to use v8::String and v8::Utf8Value in Rust