Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: include Unicode codepoint standard names to sym.txt #6

Open
emilyyyylime opened this issue Nov 17, 2024 · 5 comments · May be fixed by #9
Open

Suggestion: include Unicode codepoint standard names to sym.txt #6

emilyyyylime opened this issue Nov 17, 2024 · 5 comments · May be fixed by #9
Labels
meta Discussion about the structure of this repo

Comments

@emilyyyylime
Copy link
Collaborator

Many symbols in sym.txt are specified as their Unicode codepoint in the form U+XXXX rather than a plain character, because it would be hard to parse or notice when reading the file later. I believe using the Unicode-assigned name of such characters would be more useful and self-documenting than simply entering the code point.

Ideally, these names would be machine-checked in build.rs rather than just act as informative comments, to ease the minds of reviewers from ensuring the right name is provided for each character. These names could also then be used to look up the wanted Unicode codepoint thereby entirely replacing the U+ scalar reference.

Either way, we could opt to include the names even on characters that are directly embedded in the txt files just to have more context directly available when editing them (though this is definitely more of a bonus/personal preference change and should be discussed separately.)

@emilyyyylime emilyyyylime added the meta Discussion about the structure of this repo label Nov 17, 2024
@laurmaedje
Copy link
Member

Note that I'd like to avoid adding dependencies in build.rs just for validation as these will be pulled by any user of the library.

@emilyyyylime
Copy link
Collaborator Author

I see. Another approach is a dev-dependency or even a non-build.rs validation/generation script

@emilyyyylime
Copy link
Collaborator Author

emilyyyylime commented Nov 18, 2024

Or potentially keeping it under a feature flag. I think that will work quite well. Would you be in favor of that? I'm checking to decide whether to start work on implementing it

@MDLC01
Copy link
Collaborator

MDLC01 commented Nov 18, 2024

It may make sense to allow specifying a Unicode name instead of using the U+ syntax, but I don't like the idea of having to specify the Unicode name even when the symbol alone could be used. It feels like annoying redundancy for people writing the PRs.

@emilyyyylime
Copy link
Collaborator Author

That was certainly the intention, only to replace U+ characters, not verbatim ones. Potentially we could write a script to automatically insert the names for all characters? I'm not sure what would be the best approach here

@MDLC01 MDLC01 linked a pull request Nov 23, 2024 that will close this issue
@emilyyyylime emilyyyylime linked a pull request Nov 23, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta Discussion about the structure of this repo
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants