Suggestion: include Unicode codepoint standard names to sym.txt #6

emilyyyylime · 2024-11-17T21:42:20Z

Many symbols in sym.txt are specified as their Unicode codepoint in the form U+XXXX rather than a plain character, because it would be hard to parse or notice when reading the file later. I believe using the Unicode-assigned name of such characters would be more useful and self-documenting than simply entering the code point.

Ideally, these names would be machine-checked in build.rs rather than just act as informative comments, to ease the minds of reviewers from ensuring the right name is provided for each character. These names could also then be used to look up the wanted Unicode codepoint thereby entirely replacing the U+ scalar reference.

Either way, we could opt to include the names even on characters that are directly embedded in the txt files just to have more context directly available when editing them (though this is definitely more of a bonus/personal preference change and should be discussed separately.)

The text was updated successfully, but these errors were encountered:

laurmaedje · 2024-11-18T08:38:53Z

Note that I'd like to avoid adding dependencies in build.rs just for validation as these will be pulled by any user of the library.

emilyyyylime · 2024-11-18T10:11:32Z

I see. Another approach is a dev-dependency or even a non-build.rs validation/generation script

emilyyyylime · 2024-11-18T11:13:57Z

Or potentially keeping it under a feature flag. I think that will work quite well. Would you be in favor of that? I'm checking to decide whether to start work on implementing it

MDLC01 · 2024-11-18T13:47:33Z

It may make sense to allow specifying a Unicode name instead of using the U+ syntax, but I don't like the idea of having to specify the Unicode name even when the symbol alone could be used. It feels like annoying redundancy for people writing the PRs.

emilyyyylime · 2024-11-19T12:20:10Z

That was certainly the intention, only to replace U+ characters, not verbatim ones. Potentially we could write a script to automatically insert the names for all characters? I'm not sure what would be the best approach here

emilyyyylime added the meta Discussion about the structure of this repo label Nov 17, 2024

MDLC01 linked a pull request Nov 23, 2024 that will close this issue

Implement parsing Unicode names #9

Open

emilyyyylime linked a pull request Nov 23, 2024 that will close this issue

Implement parsing Unicode names #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: include Unicode codepoint standard names to sym.txt #6

Suggestion: include Unicode codepoint standard names to sym.txt #6

emilyyyylime commented Nov 17, 2024

laurmaedje commented Nov 18, 2024

emilyyyylime commented Nov 18, 2024

emilyyyylime commented Nov 18, 2024 •

edited

Loading

MDLC01 commented Nov 18, 2024

emilyyyylime commented Nov 19, 2024

Suggestion: include Unicode codepoint standard names to sym.txt #6

Suggestion: include Unicode codepoint standard names to sym.txt #6

Comments

emilyyyylime commented Nov 17, 2024

laurmaedje commented Nov 18, 2024

emilyyyylime commented Nov 18, 2024

emilyyyylime commented Nov 18, 2024 • edited Loading

MDLC01 commented Nov 18, 2024

emilyyyylime commented Nov 19, 2024

emilyyyylime commented Nov 18, 2024 •

edited

Loading