Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect rendering of more complex Unicode text #2517

Open
Erutuon opened this issue Dec 27, 2022 · 1 comment
Open

Incorrect rendering of more complex Unicode text #2517

Erutuon opened this issue Dec 27, 2022 · 1 comment
Labels
bug Something is broken

Comments

@Erutuon
Copy link

Erutuon commented Dec 27, 2022

Describe the bug
Left-to-right code points, cursive code points, and some grapheme clusters are rendered incorrectly in the Windows native platform.

I made a little Rust project in Windows that registers a hotkey to pop up a window to let me enter and identify Unicode characters. I work on Wiktionary and egui seemed like the easiest GUI library to use for this project because I don't have to figure out keyboard event handlers and it has good defaults for the UI elements. egui handles Unicode data just fine, but has problems with rendering that involves multiple code points rendered with a single glyph and context-dependent glyphs. This isn't a dealbreaker for me because I'm just identifying characters, but it makes it unusable for people writing GUIs in certain languages, like Arabic or Hindi.

The screenshot at the bottom shows a stack of two combining diacritics rendered in the Gentium Plus font and three Arabic letters in Scheherazade and the word "Hangeul" in Noto Sans CJK KR and a random Hindi word from Wiktionary in Siddhanta: ế ابج 한글 अत्यधिक. The combining diacritics render on top of each other, the Arabic letters render left-to-right as disconnected letters,the Hangul letters render separately,the consonant cluster त्य is rendered as three glyphs (a letter, a diacritic under the letter, and another letter, looking like त्‌य), and in the consonant vowel combination धि it puts the vowel ि after the consonant (like ध ि but without the space).

In my Firefox, which renders these correctly, the second combining diacritic will be above or to the right of the first, the Arabic letters are rendered cursive and right-to-left, the Hangul letters are rendered as their syllable block versions 한글, त्य is a single glyph, and धि has the vowel positioned before the consonant.

It looks like it basically renders the code points separately and then overlays them if they are in a grapheme cluster, whereas I guess proper rendering of the diacritics requires breaking the text into graphemes and then rendering each grapheme with any left or right joining behavior taken into account based on the neighboring characters.

This at least affects the Windows native renderer, and maybe others depending on how much of the text rendering is shared among the different platforms. Replicating is probably as simple as pasting the text into a UI element, but I can put my code up on GitHub if you want.

Desktop (please complete the following information):

  • OS: Windows

image

@Erutuon Erutuon added the bug Something is broken label Dec 27, 2022
@parasyte
Copy link
Contributor

Probably related, FWIW: #56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken
Projects
None yet
Development

No branches or pull requests

2 participants