Incorrect rendering of more complex Unicode text #2517

Erutuon · 2022-12-27T18:45:48Z

Describe the bug
Left-to-right code points, cursive code points, and some grapheme clusters are rendered incorrectly in the Windows native platform.

I made a little Rust project in Windows that registers a hotkey to pop up a window to let me enter and identify Unicode characters. I work on Wiktionary and egui seemed like the easiest GUI library to use for this project because I don't have to figure out keyboard event handlers and it has good defaults for the UI elements. egui handles Unicode data just fine, but has problems with rendering that involves multiple code points rendered with a single glyph and context-dependent glyphs. This isn't a dealbreaker for me because I'm just identifying characters, but it makes it unusable for people writing GUIs in certain languages, like Arabic or Hindi.

The screenshot at the bottom shows a stack of two combining diacritics rendered in the Gentium Plus font and three Arabic letters in Scheherazade and the word "Hangeul" in Noto Sans CJK KR and a random Hindi word from Wiktionary in Siddhanta: ế ابج 한글 अत्यधिक. The combining diacritics render on top of each other, the Arabic letters render left-to-right as disconnected letters,the Hangul letters render separately,the consonant cluster त्य is rendered as three glyphs (a letter, a diacritic under the letter, and another letter, looking like त्‌य), and in the consonant vowel combination धि it puts the vowel ि after the consonant (like ध ि but without the space).

In my Firefox, which renders these correctly, the second combining diacritic will be above or to the right of the first, the Arabic letters are rendered cursive and right-to-left, the Hangul letters are rendered as their syllable block versions 한글, त्य is a single glyph, and धि has the vowel positioned before the consonant.

It looks like it basically renders the code points separately and then overlays them if they are in a grapheme cluster, whereas I guess proper rendering of the diacritics requires breaking the text into graphemes and then rendering each grapheme with any left or right joining behavior taken into account based on the neighboring characters.

This at least affects the Windows native renderer, and maybe others depending on how much of the text rendering is shared among the different platforms. Replicating is probably as simple as pasting the text into a UI element, but I can put my code up on GitHub if you want.

Desktop (please complete the following information):

OS: Windows

The text was updated successfully, but these errors were encountered:

parasyte · 2022-12-29T04:09:43Z

Probably related, FWIW: #56

Erutuon added the bug Something is broken label Dec 27, 2022

parasyte mentioned this issue Dec 30, 2022

epaint::text::Glyph should own a grapheme cluster, not a char #2532

Open

njust mentioned this issue Oct 18, 2023

Utf8 support njust/kubelog#7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect rendering of more complex Unicode text #2517

Incorrect rendering of more complex Unicode text #2517

Erutuon commented Dec 27, 2022

parasyte commented Dec 29, 2022

Incorrect rendering of more complex Unicode text #2517

Incorrect rendering of more complex Unicode text #2517

Comments

Erutuon commented Dec 27, 2022

parasyte commented Dec 29, 2022