Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RegExp unicode mode differs from Bun runtime #28587

Closed
kevgeoleo opened this issue Mar 21, 2025 · 3 comments
Closed

RegExp unicode mode differs from Bun runtime #28587

kevgeoleo opened this issue Mar 21, 2025 · 3 comments
Labels
invalid what appeared to be an issue with Deno wasn't

Comments

@kevgeoleo
Copy link

kevgeoleo commented Mar 21, 2025

Version: Deno 2.2.5

Hi,

I would like to report an issue I encountered in Deno while I was running the following code snippet:

console.log(new RegExp("\\ud800\udc00+", "u").exec("\u{10000}\u{10000}"));

Deno gives null as output whereas Bun gives [ "𐀀", index: 0, input: "𐀀𐀀", groups: undefined ]

Regards,
Kevin

@dsherret
Copy link
Member

dsherret commented Mar 21, 2025

I think Deno is following the spec here and Bun is not. For example, this test262 passes in Deno, but fails in Bun: https://github.com/tc39/test262/blob/ce7e72d2107f99d165f4259571f10aa75753d997/test/staging/sm/RegExp/unicode-raw.js#L56 -- I'm not super familiar with the spec or the unicode modifier so can't say for sure though.

Chrome, Firefox, Node and Deno all seem to output null in this case.

@0f-0b
Copy link
Contributor

0f-0b commented Mar 21, 2025

The string "\\ud800\udc00+" consists of, in order, a \u escape of a high surrogate, a literal low surrogate, and a repetition operator +. In Unicode mode, a \u escape followed by a literal character do not form a surrogate pair, and an unpaired surrogate in the source string matches an unpaired surrogate in the input string, so this regex matches an unpaired high surrogate followed by one or more unpaired low surrogates, which is impossible. That's why exec on this regex always returns null.

The bug lies in JSC's JIT compiler. If you set the environment variable JSC_useRegExpJIT to 0 when running Bun, the bug would not happen.

@dsherret dsherret added the invalid what appeared to be an issue with Deno wasn't label Mar 22, 2025
@dsherret
Copy link
Member

@0f-0b thanks for explaining. Going to close.

@dsherret dsherret closed this as not planned Won't fix, can't repro, duplicate, stale Mar 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid what appeared to be an issue with Deno wasn't
Projects
None yet
Development

No branches or pull requests

3 participants