fix: escape non-ASCII characters as \uNNNN in string literals#11
Merged
Brooooooklyn merged 1 commit intomainfrom Feb 5, 2026
Merged
Conversation
Match TypeScript's emitter behavior by escaping all non-ASCII characters (code point > 0x7E) as \uNNNN sequences. Characters above the BMP use UTF-16 surrogate pairs. Uses push_str with hex table lookup instead of fmt::Write. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Match TypeScript's emitter behavior by escaping all non-ASCII characters (code point > 0x7E) as \uNNNN sequences. Characters above the BMP use UTF-16 surrogate pairs. Uses push_str with hex table lookup instead of fmt::Write.
Note
Medium Risk
Changes emitted JavaScript string literal contents (non-ASCII now always becomes
\uescapes), which can affect snapshot tests, output diffs, and any consumers expecting literal UTF-8 in generated code; logic is localized but touches core emission paths.Overview
Updates
escape_stringin the Angular compiler JS emitter to always escape non-ASCII characters (and ASCII control chars) as\uNNNN, emitting UTF-16 surrogate pairs for code points above the BMP to match TypeScript’s string-literal printer.Adds a small helper (
push_unicode_escape) to build escapes without formatting, expands unit coverage around surrogate pairs and non-ASCII cases, and updates integration assertions so compiled templates with HTML entities (e.g.×, ) now produce escaped sequences instead of literal Unicode characters.Written by Cursor Bugbot for commit 8d880a6. This will update automatically on new commits. Configure here.