add rust backed node buffer implementation by anonrig · Pull Request #6360 · cloudflare/workerd

anonrig · 2026-03-19T13:59:34Z

Built on top of #6272 branch. Not yet ready to review & merge.

ask-bonk

I'm Bonk, and I've done a quick review of your PR.

This PR adds a Rust-backed BufferUtil implementation behind the RUST_BACKED_NODE_BUFFER autogate, using the simdutf Rust crate linked against workerd's vendored simdutf C++ library. The autogate-gated approach is the right pattern for this kind of migration.

Issues (ranked by severity)

[High] to_string: panics on start > end — if start.value() exceeds end.value() (after clamping), &data[s..e] panics at runtime. Untrusted JS can trigger this.
[High] fill_impl: panics on start > end or start >= buffer.len() — data[start..end].fill(0) panics if start > end, and data[i] panics if start >= buffer.len() when end is also unclamped for start.
[Medium] index_of_offset: length as i32 wraps for buffers > 2 GiB — let len = length as i32 silently wraps to negative for buffers with length > i32::MAX. While V8 TypedArray limits make this unlikely in practice, it contradicts the #![expect(clippy::cast_sign_loss)] rationale that "all casts are checked or bounded by the caller".
[Medium] Massive code duplication between to_string and to_string_for_encoding — every encoding branch is copy-pasted between these two functions. A single function with a slice parameter would eliminate ~70 lines of duplicated unsafe code.
[Low] decode_string for Base64/Base64url: duplicated branches — the two branches are identical except for Base64Options::Default vs Base64Options::Url. Could be unified.
[Low] byte_length ignores encoding — the C++ byteLength takes a JsString (with encoding semantics). The Rust version accepts a String and returns str.len() (UTF-8 byte length), which is only correct for UTF-8 encoding. Other encodings would produce wrong results.

src/rust/api/buffer.rs

ask-bonk · 2026-03-19T14:06:12Z

I'm Bonk, and I've done a quick review of your PR.

This PR adds a Rust-backed BufferUtil implementation behind the RUST_BACKED_NODE_BUFFER autogate, using the simdutf Rust crate linked against workerd's vendored simdutf C++ library. The autogate-gated approach is correct for this kind of migration.

I posted a review with 6 inline comments on PR #6360. The key findings ranked by severity:

[High] to_string panics on start > end -- untrusted JS can trigger a runtime panic by passing start > end
[High] fill_impl panics on start > end or out-of-bounds start -- same class of issue
[Medium] index_of_offset silently wraps length via as i32 for buffers > 2 GiB
[Medium] ~70 lines of duplicated unsafe simdutf code between to_string and to_string_for_encoding
[Low] Duplicated Base64/Base64url branches in decode_string
[Low] byte_length ignores encoding -- returns UTF-8 byte length regardless of encoding

Each comment includes a concrete suggestion where applicable.

github run

codspeed-hq · 2026-03-19T15:06:02Z

Merging this PR will not alter performance

✅ 70 untouched benchmarks
⏩ 129 skipped benchmarks¹

_{Comparing yagiz/add-buffer-rust (1d5c469) with main (b4fc0eb)}

129 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

jasnell · 2026-03-19T17:27:55Z

/bigbonk review carefully specifically looking for definite or possible breaking behavioral changes. identify anything that could potentially break user assumptions

ask-bonk

Follow-up review after deeper analysis. These are new findings beyond my initial review.

[Medium] index_of uses naive O(n*m) search vs C++ SearchString (Boyer-Moore-Horspool) — likely source of CodSpeed regression
[Medium] decode_string ASCII/Latin1 combined branch has wrong ASCII semantics
[Medium] nbytes::base64_decode doesn't distinguish Base64 from Base64url encoding
[Low] decode UTF-16LE surrogate detection bitmask is too narrow (faithfully ported from C++)
[Low] nbytes FFI does element-by-element copy instead of bulk memcpy

This review was generated by an AI code review assistant and may contain inaccuracies.

ask-bonk

This review was generated by an AI code review assistant and may contain inaccuracies.

src/rust/api/buffer.rs

ask-bonk

Follow-up deep review with additional findings beyond the initial review.

[Medium] index_of uses naive O(n*m) search vs C++ Boyer-Moore-Horspool — likely CodSpeed regression source
[Medium] decode_string ASCII branch has wrong semantics (should mask to 7 bits)
[Medium] decode_string does not distinguish Base64 from Base64url
[Low] decode UTF-16LE surrogate detection bitmask is too narrow (faithfully ported from C++)
[Low] nbytes FFI does element-by-element copy instead of bulk memcpy

This review was generated by an AI code review assistant and may contain inaccuracies.

ask-bonk · 2026-03-19T17:33:13Z

src/rust/api/buffer.rs

+        #[expect(clippy::cast_sign_loss, reason = "opt_offset verified non-negative")]
+        let start = opt_offset as usize;
+        if is_forward {
+            for i in start..=data.len().saturating_sub(needle.len()) {


Naive O(n*m) search — likely perf regression source: The C++ indexOf uses SearchString (Boyer-Moore-Horspool from buffer-string-search.h) for efficient substring search. This Rust version uses a byte-by-byte comparison loop, which is O(n*m) worst case. For large buffers with long needles, this is significantly slower.

This is likely the root cause of the CodSpeed simpleStringBody[Response] regression (-26.77%). Even if this specific benchmark does not call indexOf directly, the naive pattern matching approach suggests other hot paths may have similar regressions.

Consider:

memchr crate (SIMD-accelerated) for single-byte needles

data.windows(needle.len()).position(|w| w == needle) for a compiler-optimized multi-byte fallback

Porting SearchString (Boyer-Moore-Horspool) for parity with the C++ version

ask-bonk · 2026-03-19T17:33:13Z

src/rust/api/buffer.rs

+        Ok(match enc {
+            // ASCII and Latin1 both take the low byte of each Unicode code point.
+            // We convert from UTF-8 chars (not raw bytes) to get the correct code points.
+            Encoding::Ascii | Encoding::Latin1 => string.chars().map(|c| c as u8).collect(),


ASCII/Latin1 branches produce identical output, but ASCII should mask to 7 bits: The C++ Encoding::ASCII handling masks each byte with & 0x7f (see toStringImpl in buffer.c++). But this Rust decode_string combines both into a single branch that just truncates to u8 via c as u8. For Latin1, this is correct (low byte of Unicode code point). For ASCII, this is wrong — Node.js Buffer.from("\u00e9", "ascii") should produce 0x69 (= 0xe9 & 0x7f), not 0xe9.

Suggested change

Encoding::Ascii | Encoding::Latin1 => string.chars().map(|c| c as u8).collect(),

Encoding::Ascii => string.chars().map(|c| (c as u8) & 0x7f).collect(),

Encoding::Latin1 => string.chars().map(|c| c as u8).collect(),

ask-bonk · 2026-03-19T17:33:13Z

src/rust/api/buffer.rs

+                // Node.js uses nbytes::Base64Decode which is very lenient — it
+                // ignores invalid characters and decodes whatever valid base64
+                // chars are present.
+                nbytes::base64_decode(src)


Base64 and Base64url use the same decode path: nbytes::base64_decode(src) delegates to C++ nbytes::Base64Decode, which uses standard Base64 alphabet (+, /). For Base64url, the alphabet should be - and _ instead. Node.js Buffer.from(data, "base64url") should decode URL-safe base64.

The C++ decodeStringImpl also uses nbytes::Base64Decode for both, which happens to be lenient enough to accept both alphabets. If that leniency is intentional and tested, this is fine — but it would be good to add a comment explaining the intentional fallthrough, and ideally a test for URL-safe characters (- and _).

src/rust/api/buffer.rs

ask-bonk · 2026-03-19T17:33:13Z

src/rust/nbytes/ffi.c++

+  ::rust::Vec<uint8_t> result;
+  result.reserve(actual);
+  auto* data = reinterpret_cast<const uint8_t*>(buf.get());
+  std::copy(data, data + actual, std::back_inserter(result));


Element-by-element copy: std::back_inserter copies bytes one at a time. Since you already have a contiguous buffer, consider using a bulk copy:

Suggested change

std::copy(data, data + actual, std::back_inserter(result));

::rust::Vec<uint8_t> result;

result.reserve(actual);

auto* data = reinterpret_cast<const uint8_t*>(buf.get());

std::copy_n(data, actual, std::back_inserter(result));

Or even better, if rust::Vec supports it, construct from the range directly. The current code is equivalent to std::copy with iterators, which the compiler may optimize to memcpy — but it is not guaranteed.

ask-bonk · 2026-03-19T17:33:13Z

src/rust/api/buffer.rs

+        };
+
+        let data = buffer.as_slice();
+        let offset = byte_offset.value() as i32;


byte_offset f64→i32 truncation: byte_offset.value() as i32 silently truncates values outside the i32 range. JavaScript numbers can represent offsets larger than i32::MAX (e.g., 2147483648). The as i32 cast wraps to negative, changing search semantics. Consider:

let offset = byte_offset.value() as i64; // then use i64 throughout the offset computation

This is related to my earlier comment about length as i32, but byte_offset is also affected and is more likely to actually receive large values from user code.

ask-bonk · 2026-03-19T17:33:51Z

Review posted successfully on PR #6360. Here's a summary of what was done:

Initial review (already posted): 6 inline comments covering panics on untrusted input, silent i32 truncation, code duplication, and byteLength encoding mismatch.

Follow-up deep review (just posted): 6 additional inline comments with new findings:

Severity	Finding	Location
Medium	`index_of` uses naive O(n*m) search vs C++ Boyer-Moore-Horspool — likely CodSpeed regression source	`buffer.rs:517`
Medium	`decode_string` ASCII branch doesn't mask to 7 bits (combined with Latin1, but should differ)	`buffer.rs:371`
Medium	`decode_string` Base64/Base64url use same `nbytes::base64_decode` without distinguishing alphabets	`buffer.rs:393`
Low	`byte_offset` f64→i32 truncation in `index_of`	`buffer.rs:473`
Low	`nbytes` FFI element-by-element copy instead of bulk memcpy	`nbytes/ffi.c++:23`
Info	UTF-16LE surrogate bitmask self-corrected (initially thought wrong, verified correct)	`buffer.rs:819`

The ASCII semantics bug (buffer.rs:371) includes a suggested fix.

Limerick incoming!

A Buffer in Rust was porting along,
Its search loop was naively wrong,
With O(n times m),
It slowed every gem —
Boyer-Moore would have made it more strong!

github run

danlapid · 2026-03-19T19:29:25Z

src/rust/jsg/autogate/ffi.h

+
+// Thin wrapper around workerd::util::Autogate::isEnabled().
+// The key is a string matching the kebab-case name (e.g. "rust-backed-node-buffer").
+inline bool is_enabled(::rust::String key) {


autogate checks are supposed to be super cheap, this function is not super cheap nor a thing wrapper

Yes, this is not ideal and will be changed before merging. Just did a quick hack to see the test suite results.

jasnell · 2026-03-19T20:06:08Z

src/rust/api/buffer.rs

+
+/// Decodes hex, truncating at the first invalid hex pair (Node.js behavior).
+/// In strict mode, returns an error on invalid input instead of truncating.
+fn hex_decode(src: &[u8], strict: bool) -> Result<Vec<u8>, jsg::Error> {


can we have something other than bool args?

yes, I know the existing c++ code has bool args but that's just because I haven't gotten around to updating everything to use WD_STRONG_BOOL yet

I think we need this clippy rule to be enabled: fn_params_excessive_bools https://rust-lang.github.io/rust-clippy/master/index.html#fn_params_excessive_bools

I'm not sure how that addresses my ask? It's not about excessive bools, it's about bool args period. Is it not possible to have something like WD_STRONG_BOOL in rust?

src/rust/api/buffer.rs

jasnell · 2026-03-19T20:09:59Z

src/rust/api/buffer.rs

+    #[jsg_static_constant]
+    pub const BASE64URL: u8 = 5;
+    #[jsg_static_constant]
+    pub const HEX: u8 = 6;


Nit ... a comment indicating that these need to be kept in sync with the enum would be helpful.

Yeah, we should just use enum values rather than constant values here.

jasnell · 2026-03-19T20:13:26Z

src/rust/api/buffer.rs

+    pub const HEX: u8 = 6;
+
+    #[jsg_method]
+    pub fn is_ascii(&self, bytes: jsg::v8::Local<jsg::v8::Uint8Array>) -> bool {


Ideally I think long term we'd want to have an equivalent to jsg::Js* wrapper types in Rust like we do in C++. the jsg::v8::Local<...> pattern here is unfortunate.

I think we just don't need it because they're actually v8::XXX. v8::String stores a pointer to v8::ffi::String and there is no blocker to make v8::String act like jsg::JsString.

I'm not following what you're saying here... specifically, I don't know what you mean by, "there is no blocker to make v8::String act like jsg::JsString".

I should not have to think about the jsg::v8::Local<...> part of this at all as it just makes things more verbose and isn't a detail I really should have to worry about. A jsg::v8::JsUint8Array or jsg::v8::JsString, etc is less verbose and is just easier to work with.

I'm firmly -1 on the jsg::v8::Local<T> pattern unless there's an essential reason for keeping it, which I'm not seeing.

My recommendation is to keep this as is. Move the v8::* bindings to a different crate, and implement jsg::JsUint8Array or any other class using the v8 bindings and abstract the Local handle logic.

jasnell · 2026-03-25T14:41:20Z

src/rust/api/buffer.rs

+        byte_offset: jsg::Number,
+        encoding: jsg::Number,
+        is_forward: jsg::NonCoercible<bool>,
+    ) -> Result<jsg::Number, jsg::Error> {


The performance of this algorithm is going to be O(n*m) at the best. That going to be too much of a performance hit for this.

jasnell

This is going to need a bit more work. The index_of performance issues are significant, and I believe your unwrap_string impl looks like it is potentially swallowing errors. Also, there are multiple failing tests that will need to be looked at.

anonrig requested review from a team as code owners March 19, 2026 13:59

anonrig marked this pull request as draft March 19, 2026 14:00

ask-bonk bot reviewed Mar 19, 2026

View reviewed changes

anonrig force-pushed the yagiz/add-buffer-rust branch from 00d2f75 to 29a1e01 Compare March 19, 2026 14:50

anonrig force-pushed the yagiz/add-buffer-rust branch 2 times, most recently from ab8b578 to faea404 Compare March 19, 2026 17:01

ask-bonk bot reviewed Mar 19, 2026

View reviewed changes

src/rust/api/buffer.rs Outdated Show resolved Hide resolved

ask-bonk bot reviewed Mar 19, 2026

View reviewed changes

anonrig force-pushed the yagiz/add-buffer-rust branch 2 times, most recently from 24b7e96 to a4f1655 Compare March 19, 2026 18:51

anonrig force-pushed the yagiz/rust-gc-v2 branch from 97e306b to bcbf3ee Compare March 19, 2026 19:02

danlapid reviewed Mar 19, 2026

View reviewed changes

jasnell reviewed Mar 19, 2026

View reviewed changes

src/rust/api/buffer.rs Outdated Show resolved Hide resolved

jasnell reviewed Mar 19, 2026

View reviewed changes

anonrig force-pushed the yagiz/rust-gc-v2 branch 5 times, most recently from 677011e to 3e59d17 Compare March 20, 2026 22:47

Base automatically changed from yagiz/rust-gc-v2 to main March 20, 2026 23:31

anonrig force-pushed the yagiz/add-buffer-rust branch from a4f1655 to 0656546 Compare March 23, 2026 15:42

add rust backed node buffer implementation

1d5c469

anonrig force-pushed the yagiz/add-buffer-rust branch from 0656546 to 1d5c469 Compare March 23, 2026 15:54

This was referenced Mar 23, 2026

add several TypedArray methods to Rust JSG #6386

Merged

enable fn_params_excessive_bools clippy rule #6387

Merged

guybedford approved these changes Mar 24, 2026

View reviewed changes

jasnell reviewed Mar 25, 2026

View reviewed changes

jasnell requested changes Mar 25, 2026

View reviewed changes

	Encoding::Ascii \| Encoding::Latin1 => string.chars().map(\|c\| c as u8).collect(),
	Encoding::Ascii => string.chars().map(\|c\| (c as u8) & 0x7f).collect(),
	Encoding::Latin1 => string.chars().map(\|c\| c as u8).collect(),

Conversation

anonrig commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ask-bonk bot left a comment

Choose a reason for hiding this comment

Issues (ranked by severity)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ask-bonk bot commented Mar 19, 2026

Uh oh!

codspeed-hq bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

jasnell commented Mar 19, 2026

Uh oh!

ask-bonk bot left a comment

Choose a reason for hiding this comment

Uh oh!

ask-bonk bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ask-bonk bot left a comment

Choose a reason for hiding this comment

Uh oh!

ask-bonk bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ask-bonk bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk bot commented Mar 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anonrig Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anonrig commented Mar 19, 2026 •

edited

Loading

codspeed-hq bot commented Mar 19, 2026 •

edited

Loading

anonrig Mar 23, 2026 •

edited

Loading

jasnell left a comment •

edited

Loading