-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Soft Hyphens and Zero Width Spaces #806
Comments
That's an interesting proposition. I can't imagine it would be the most challenging ask. I'm curious: how do these characters make their way into your documents? |
There is another issue with hyphens. In some formats, especially plain text or files automatically converted from plain text (think Project Gutenberg), hyphens are used at the ends of lines between either syllables or morphemes, so the real word needs to be constructed from the two halves before it can take part in spelling and grammar checks. But some of the words this happens to are hyphenated compounds, so there is an ambiguity. It's similar to sentences that end with abbreviations that end with a period. We won't have to worry about this for some time though. |
I just learned about these and am in the exploration phase, so I don't have a set way and am not using them often. I'm trying to discover if it makes sense to use unicode in documentation if the tooling around their use was better. I know this is long. It's just for those that are curious. Typst supports using Since I don't want to take the time to memorize Unicode values to enter into Neovim using the Reading text with unicode characters in Neovim isn't great either ( For Typst, #set page(margin:.1in, width:2in, height:1in,)
// Either work
// #show "supercalifragilistic": "supercalifragilistic" // There are soft hyphens that don't show in github
#show "supercalifragilistic": [super#sym.hyph.soft;cali#sym.hyph.soft;fragilistic]
That is very supercalifragilistic. For Word Joiners (that are not relevant to spelling) I wrote a Lua filter (unpublished) for Pandoc that can be used with ---
title: Word Joiner Example
words:
- 800-53
---
RMF Controls are documented in NIST SP 800-53. The filter will read the list of words and utilize a Typst template to generate a So, there are a lot of road blocks to using these characters. Spell checking is just one issue to inserting soft hyphens and zero width spaces into text. I suspect many people don't know about the utility of these characters, and I don't see that significantly changing. It might just be too complex and confusing for those that don't have these tools setup, so I'm not sure how much I'll use them in the future. |
U+00AD is a soft hyphen. This is used to break a word if necessary and input a hyphen when that occurs.
U+200B is a zero width space. This is used to break a word if necessary and not input a hyphen when that occurs.
Both MS Word and harper-ls in neovim both fail to check the full word for spelling mistakes with either of these symbols are used. Has there been any thought to stripping these before checking against the dictionary?
The text was updated successfully, but these errors were encountered: