Skip to content

Conversation

@jquast
Copy link
Owner

@jquast jquast commented Feb 14, 2024

I did a few spot checks of VS-15 when implementing VS-16, and erroneously believed that all emojis in VS-15 sequences were already listed as by EastAsianWidth.txt as width of 1.

But that's not true. There are several emojis that are "wide" that are changed to "narrow" with VS-15.

Reported by @rivo in muesli/reflow#73 (comment)

@rivo: you declare that our "Specification" is "missing some things", I would appreciate any further things that you find wrong.

@jquast jquast marked this pull request as ready for review February 14, 2024 20:10
@codecov
Copy link

codecov bot commented Feb 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (056ee4b) 100.00% compared to head (f00fba5) 100.00%.

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #120   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         6    +1     
  Lines          105       115   +10     
  Branches        25        28    +3     
=========================================
+ Hits           105       115   +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

last_measured_char = None
idx += 1
continue
if char == u'\uFE0E' and last_measured_char:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u'\uFE0E' - maybe time kill python 2.7?

@rivo
Copy link

rivo commented Feb 15, 2024

@rivo: you declare that our "Specification" is "missing some things", I would appreciate any further things that you find wrong.

I guess VS15 was the main one. I noticed other minor things like U+FF9E or U+FF9F which the grapheme cluster spec lists as "extending characters", thus width of 0 IMO but would get a width of 1 according to your algorithm, thus, e.g. ギ would result in a width of 2.

I'm also not sure what exactly this means:

Any character following ZWJ (U+200D) when in sequence by function wcwidth.wcswidth().

How do you determine when such a sequence ends? E.g. how do you determine the width of a string such as this one:

👩‍👩‍👦‍👦abc

I personally also think that characters such as U+2E3B should be much wider (width of 1 in your library). I've seen it span at least 4 cells in various macOS applications.

There may be more, or maybe not, I don't know. Validating your algorithm would be a lot of effort.

I put the word Specification in quotes because while you do note that it's a description of your implementation, I find that writing "I authored a formal Specification detailing how characters should be measured" makes it sound more authoritative than it is, especially in the context of assigning grades to other applications based on how well they "perform". In my comment, I was trying to make the point that there is no official specification and so far, every attempt I have seen has had flaws.

Even my own implementation is not perfect. Consider U+FDFD which both of our libraries will assign a width of 1:

U+FDFD: ﷽

In iTerm2, it spans 5 cells. In Chrome on macOS with a monospace font, it's even 11 cells wide.

It would be interesting to render out these characters on different platforms with the most common fonts and check how their widths compare to our calculated widths. This might reveal some general flaws or at the very least it would identify outliers such as U+FDFD.

I haven't had time for such a project yet, though.

@GalaxySnail
Copy link
Collaborator

I noticed other minor things like U+FF9E or U+FF9F which the grapheme cluster spec lists as "extending characters", thus width of 0 IMO but would get a width of 1 according to your algorithm, thus, e.g. ギ would result in a width of 2.

Shouldn't the width of ギ be 2? This string is aligned with East Asian Wide characters in my web browser.

ギ一二三
一二三四

@rivo
Copy link

rivo commented Feb 16, 2024

Shouldn't the width of ギ be 2?

This is U+FF77 and U+FF9E. U+FF77 is a half-width character, thus width=1. Is U+FF9E a separate character? The Grapheme Cluster Spec says it's an "extending character" and those typically don't take up extra space. They typically fall into the "Mn" category.

I'm aware that some fonts will still create extra space for them, which is likely why it looks ok in your browser. I guess it's nothing big to worry about. (That's why I wrote "minor" above.)

@jquast
Copy link
Owner Author

jquast commented Feb 16, 2024

Thank you for your feedback @rivo I honestly appreciate it,

Any character following ZWJ (U+200D) when in sequence by function wcwidth.wcswidth().

How do you determine when such a sequence ends? E.g. how do you determine the width of a string such as this one:

👩‍👩‍👦‍👦abc (U+1f469, U+200d, U+1f469, U+200d, U+1f466, U+200d, U+1f466, 'a', 'b', 'c')

The characters that follow U+200d are not counted. So the first character, U+1f469 is of width 2, but the characters following the three U+200d's (U+1f469, U+1f466, U+1f466) are not counted, while 'abc' is counted normally. So the final width is 2 + 3 = 5:

>>> import wcwidth
>>> wcwidth.wcswidth('👩‍👩‍👦‍👦abc')
5

The algorithm in wcswidth is fairly basic,

wcwidth/wcwidth/wcwidth.py

Lines 189 to 192 in 056ee4b

if char == u'\u200D':
# Zero Width Joiner, do not measure this or next character
idx += 2
continue

There may be more, or maybe not, I don't know. Validating your algorithm would be a lot of effort.

Maybe you missed the details of the ucs-detect tool that I have written, but it does validate the ZWJ algorithm is 100% compliant with Konsole, foot, iTerm2, and WezTerm. (A+ score for "ZWJ" at https://ucs-detect.readthedocs.io/results.html)

I find that writing "I authored a formal Specification detailing how characters should be measured" makes it sound more authoritative than it is

My apologies for that. I will modify all references to be very specific that it is the "Specification of how python wcwidth package measures..." etc, I can only say that I try to be as terse as possible. Because wcwidth is of interest to non-english speakers that may be reading it with difficulty or through translation, I try not to mince too many words, so it may come across as more authoritative than I intended.

U+FDFD: ﷽

In iTerm2, it spans 5 cells. In Chrome on macOS with a monospace font, it's even 11 cells wide.

As for these kinds of scripts (Arabic in this case), fixed-width fonts and monospace constraints of a terminal is not appropriate. We can only do so much to interpret unicode.org specifications for the terminal environment, but measuring this kind of script isn't possible with the data files that they publish. Supporting this kind of thing would require digging into the font and its rendering engine, which wouldn't be very reasonable to implement for a general purpose command-line application library. Even terminal emulators don't often dig into the font engine. I don't believe that folks who use these languages will be very successful in designing interactive curses applications.

Aside, I do wish for there to be a terminal sequence to display variable-width fonts and be released from the constraints of monospacing for such languages. Such a sequence could be used or detected at the application level to assume that the position is indeterminate, and rely on "cursor position report" queries for only an approximation of the nearest current cell.

Because popular multi-language terminals (mlterm, foot, iTerm2, Konsole) measure it as width of 1 then that is what I wish for my library to report. I don't wish to invent any new specification or standards, I apologize if it is ever interpreted that way, I will include more phrasing in the README.rst to make that clear. For example, if I found a statement in a unicode.org document that disagreed with all popular terminals, I would rather our library and specification match the most popular terminals.

@rivo
Copy link

rivo commented Feb 17, 2024

I appreciate your thoughtful response.

monospace constraints of a terminal

Our goals may differ but in my Golang implementation, I don't think I mention terminals at all. Of course, it's a common area where these libraries are being used. But, for example, lots of people use VS Code and other IDEs which also use monospace fonts (except for the few who swear by variable fonts for programming but let's ignore them for now). Increasingly, such editors are used for more than just programming. Markdown, for example, appears to be integrated in more and more contexts (e.g. blog publishing, note taking, e-book authoring) and since IDEs have good support for writing Markdown, people will naturally gravitate towards using them. So I expect that these kind of algorithms are / will be relevant outside of terminals and for many different languages.

You are completely right in that there is value in offering an algorithm that matches the most commonly used terminals, even when they're "wrong". There is no point in deciding a character is 2 cells wide when all terminals render it in 1 cell. In tview, my terminal UI library, I can say ﷽ is 5 cells wide but when it gets rendered out to the terminal, I need to assume the terminal assumes it to be 1 cell, therefore I have to output another 4 space characters to match my own interpretation. So I need both my own width library and the wcwidth library which matches most terminals.

(0x02b55, 0x02b55,), # Heavy Large Circle
(0x03030, 0x03030,), # Wavy Dash
(0x0303d, 0x0303d,), # Part Alternation Mark
(0x03297, 0x03297,), # Circled Ideograph Congratulation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CJK ideographs should never be narrow, emoji presentation or not.

@jquast
Copy link
Owner Author

jquast commented Sep 17, 2025

I've converted this PR to a draft, though, I may restart the work with next release of ucs-detect, if I can find common behavior among popular/modern terminals.

I think I would recommend anyone never to use VS15 at all, it really isn't very necessary :)

@jacobsandlund
Copy link

Just want to say as an author of yet another Unicode library (this one in Zig), this discussion has been really helpful in deciding what to report for wcwidth!

It would be interesting to render out these characters on different platforms with the most common fonts and check how their widths compare to our calculated widths.

This would be interesting indeed!

@jquast
Copy link
Owner Author

jquast commented Oct 15, 2025

I'm glad you are interested. I refrained from merging, much less publishing results of the last round of VS-15 testing because I didn't want to upset anyone about "declaring" truthiness, that this one terminal gets it totally wrong and this other one is right.. but I know there is interest at least in understanding the "landscape" of support.

So I will just try very hard to write in plain english that this is just how wcwidth python measures VS-15, it is our specification only and is probably not a correct as we'd like, that there is no good specification for VS-15 for terminals (heck, all of unicode.org's specifications rarely even mention terminals), that VS-15 is not yet well understood and that results vary.

I'm just finishing up a large feature in supporting "blessed" library to improve "ucs-detect" and make a new release, to include "Dec Private Mode Support", to query about and conditionally enable grapheme clustering, to include in the report. I have a feeling results will vary there enough that query of this Dec Mode is not a reliable indicator of support. Konsole, one of the better scoring terminals, does not negotiate/report about any dec private modes, nevermind this specific one.

So maybe at least another week or so, I will have a new report to share, and I will try to make time to include VS-15 results.

I did a few spot checks of VS-15 when implementing VS-16, and
erroneously believed that all emojis in VS-15 sequences were already
listed as an EAW width of 1. But that's not true. There are several
emojis that are "wide" that are changed to "narrow" with VS-15.
Add any additional U+FE0F/U+FE0E check in sequence of wcswidth() to
ensure 100% code coverage
The previous table was generated with inverted logic (keeping narrow
characters instead of wide characters), a kind of experiment that
I may study outside of this project. Regenerated/reverted vs-15 table
@jquast
Copy link
Owner Author

jquast commented Oct 31, 2025

I have integrated with this branch for variation selector 15 support and done enough testing to discover that only kitty supports variation selector 15.

And, I made a specific test for those 4 characters described by @Jules-Bertholet, and kitty does not match the expectaction that these four should not become narrow even if followed by vs-15.

Although this is a small minority of terminals, Kovid has published a specification for kitty at https://sw.kovidgoyal.net/kitty/text-sizing-protocol/#the-algorithm-for-splitting-text-into-cells logged about in kovidgoyal/kitty#8533 which closely matches our specification https://wcwidth.readthedocs.io/en/latest/specs.html for VS15 in this pull request.

I think for VS-15, that it is a rare character sequence, at least I don't often encounter it "in the wild" as a westerner, that it is fine for python wcwidth to match the behavior of only one other terminal, kitty, because at least that one has a published specification. This means we do not make the exception for those 4 characters.

Anyone can certainly try to persuade Kovid for the exception described #120 (review) that CJK characters, only 4 of them by our calculation, should not be allowed to be made narrow as code and specification pull request in Kitty.

@jquast
Copy link
Owner Author

jquast commented Oct 31, 2025

maybe to be more clear, to use phrase "glitch sequence" I would say any kind of codepoint sequence that isn't really "sensible" could be called that, in this case making codepoints narrow that should not be tried for, it is not really necessary for wcwidth to enforce or declare specification for, it just makes the code slower and specification longer

@jquast jquast marked this pull request as ready for review November 1, 2025 23:29
@jquast
Copy link
Owner Author

jquast commented Nov 2, 2025

Results of variation selector 15 testing in 34 terminals can be found at https://ucs-detect.org.readthedocs.build/results.html -- supported by only ghostty and kitty by my measure, I think its fine to merge and release into python wcwidth, so I have converted this PR out of draft.

@jquast
Copy link
Owner Author

jquast commented Nov 3, 2025

I have also published an article that briefly describes VS-15, https://www.jeffquast.com/post/state-of-terminal-emulation-2025/

Kitty and Ghostty are the only terminals that correctly support Variation Selector 15, I have not written much about it because it is not likely to see any practical use, but, it will be added to a future release of Python wcwidth now that there are multiple standards and reference implementations in agreement.

@jquast
Copy link
Owner Author

jquast commented Nov 9, 2025

I found constructive criticism from @j4james and I hope it is OK with you if I share that here. I appreciate your expertise and feedback and invite you and anyone else to discuss it further.

To summarize briefly from from HN, you wrote:

The only thing that VS-15 is supposed to do is give the emoji a "text
presentation", which the spec suggests as "black & white".

We do agree, TR51 only indicates to change to "text presentation", defined mostly as black and white by the spec.

It would have been nice if TR51 made any direct claims -- affirmative or negative -- in regards of changing wide to narrow or narrow to wide for either definitions of VS-15 and 16, it simply does not.

We agree completely, here, it sucks.

But it sounds like you believe that Variation Selector 16, which causes emoji style, should be 2 cells,

VS-16 changing a text presentation character from narrow to wide doesn't share
these problems. And that behaviour is supported by the spec text, which says
that emoji should generally have a square aspect ratio. And at one time the
East Asian Width spec specifically mentioned VS-16 making narrow characters
wide (but said nothing about VS-15 making wide characters narrow).

I think you are referring to TR11 which ends with note on the section on Variation Sequences first added by Unicode 16.0.0:

Note: Some variation sequences may specify a narrow or wide presentation behaviour. (This can be true for sequences defined in StandardizedVariants.txt [StandardizedVariants] in the [UCD] or in emoji-variation-sequences.txt in [UTS51].) For such sequences, the width behaviour specified by a variation sequence can differ from the East_Asian_Width property of the base character of that variation sequence. The width indication of the variation sequence can specify some aspects of glyph appearance such as advance width, while not specifying other aspects of presentation, such as rotational behavior in vertical text.

I think this is probably the "best" supporting statement regarding VS-15 I can find on unicode.org, it is very weak, and that sucks. But it cites emoji-variation-sequences.txt, which has only two kinds, VS-15 and 16.

In any case, I have updated the bin/wcwidth-browser.py program to select for VS-15 and VS-16 (keys 5 or 6), for EastAianWidth defined Narrow or Wide (keys 1 or 2), and toggle displaying with or without the variation selector (w).

I am sharing the full output of wide characters with VS-15 in disagreement, which this project wishes to specify as Narrow by this pull request,

  • bin/wcwidth-browser.py --vs15 --wide=2
                                           Delimiters (|) should align, unicode version is auto.
|-|===================|-|===================|-|===================|-|===================|-|===================|-|===================
|⌚︎| 0x0fe0e Watch     |⌛︎| 0x0fe0e Hourglass |⏩︎| 0x0fe0e Black Ri $|⏪︎| 0x0fe0e Black Le $|⏫︎| 0x0fe0e Black Up $|⏬︎| 0x0fe0e Black Do $
|⏰︎| 0x0fe0e Alarm Cl $|⏳︎| 0x0fe0e Hourglas $|◽︎| 0x0fe0e White Me $|◾︎| 0x0fe0e Black Me $|☔︎| 0x0fe0e Umbrella $|☕︎| 0x0fe0e Hot Beve $
|♈︎| 0x0fe0e Aries     |♉︎| 0x0fe0e Taurus    |♊︎| 0x0fe0e Gemini    |♋︎| 0x0fe0e Cancer    |♌︎| 0x0fe0e Leo       |♍︎| 0x0fe0e Virgo
|♎︎| 0x0fe0e Libra     |♏︎| 0x0fe0e Scorpius  |♐︎| 0x0fe0e Sagittar $|♑︎| 0x0fe0e Capricorn |♒︎| 0x0fe0e Aquarius  |♓︎| 0x0fe0e Pisces
|♿︎| 0x0fe0e Wheelcha $|⚓︎| 0x0fe0e Anchor    |⚡︎| 0x0fe0e High Vol $|⚪︎| 0x0fe0e Medium W $|⚫︎| 0x0fe0e Medium B $|⚽︎| 0x0fe0e Soccer B $
|⚾︎| 0x0fe0e Baseball  |⛄︎| 0x0fe0e Snowman  $|⛅︎| 0x0fe0e Sun Behi $|⛎︎| 0x0fe0e Ophiuchus |⛔︎| 0x0fe0e No Entry  |⛪︎| 0x0fe0e Church
|⛲︎| 0x0fe0e Fountain  |⛳︎| 0x0fe0e Flag In  $|⛵︎| 0x0fe0e Sailboat  |⛺︎| 0x0fe0e Tent      |⛽︎| 0x0fe0e Fuel Pump |✅︎| 0x0fe0e White He $
|✊︎| 0x0fe0e Raised F $|✋︎| 0x0fe0e Raised H $|✨︎| 0x0fe0e Sparkles  |❌︎| 0x0fe0e Cross Mark|❎︎| 0x0fe0e Negative $|❓︎| 0x0fe0e Black Qu $
|❔︎| 0x0fe0e White Qu $|❕︎| 0x0fe0e White Ex $|❗︎| 0x0fe0e Heavy Ex $|➕︎| 0x0fe0e Heavy Pl $|➖︎| 0x0fe0e Heavy Mi $|➗︎| 0x0fe0e Heavy Di $
|➰︎| 0x0fe0e Curly Loop|➿︎| 0x0fe0e Double C $|⬛︎| 0x0fe0e Black La $|⬜︎| 0x0fe0e White La $|⭐︎| 0x0fe0e White Me $|⭕︎| 0x0fe0e Heavy La $
|〰︎| 0x0fe0e Wavy Dash |〽︎| 0x0fe0e Part Alt $|㊗︎| 0x0fe0e Circled  $|㊙︎| 0x0fe0e Circled  $|🀄︎| 0x0fe0e Mahjong  $|🈂︎| 0x0fe0e Squared  $
|🈚︎| 0x0fe0e Squared  $|🈯︎| 0x0fe0e Squared  $|🈷︎| 0x0fe0e Squared  $|🌍︎| 0x0fe0e Earth Gl $|🌎︎| 0x0fe0e Earth Gl $|🌏︎| 0x0fe0e Earth Gl $
|🌕︎| 0x0fe0e Full Moo $|🌜︎| 0x0fe0e Last Qua $|🍸︎| 0x0fe0e Cocktail $|🎓︎| 0x0fe0e Graduati $|🎧︎| 0x0fe0e Headphone |🎬︎| 0x0fe0e Clapper  $
|🎭︎| 0x0fe0e Performi $|🎮︎| 0x0fe0e Video Game|🏂︎| 0x0fe0e Snowboar $|🏄︎| 0x0fe0e Surfer    |🏆︎| 0x0fe0e Trophy    |🏊︎| 0x0fe0e Swimmer
|🏠︎| 0x0fe0e House Bu $|🏭︎| 0x0fe0e Factory   |🐈︎| 0x0fe0e Cat       |🐕︎| 0x0fe0e Dog       |🐟︎| 0x0fe0e Fish      |🐦︎| 0x0fe0e Bird
|👂︎| 0x0fe0e Ear       |👆︎| 0x0fe0e White Up $|👇︎| 0x0fe0e White Do $|👈︎| 0x0fe0e White Le $|👉︎| 0x0fe0e White Ri $|👍︎| 0x0fe0e Thumbs U $
|👎︎| 0x0fe0e Thumbs D $|👓︎| 0x0fe0e Eyeglasses|👪︎| 0x0fe0e Family    |👽︎| 0x0fe0e Extrater $|💣︎| 0x0fe0e Bomb      |💰︎| 0x0fe0e Money Bag
|💳︎| 0x0fe0e Credit C $|💻︎| 0x0fe0e Personal $|💿︎| 0x0fe0e Optical  $|📋︎| 0x0fe0e Clipboard |📚︎| 0x0fe0e Books     |📟︎| 0x0fe0e Pager
|📤︎| 0x0fe0e Outbox T $|📥︎| 0x0fe0e Inbox Tray|📦︎| 0x0fe0e Package   |📪︎| 0x0fe0e Closed M $|📫︎| 0x0fe0e Closed M $|📬︎| 0x0fe0e Open Mai $
|📭︎| 0x0fe0e Open Mai $|📷︎| 0x0fe0e Camera    |📹︎| 0x0fe0e Video Ca $|📺︎| 0x0fe0e Television|📻︎| 0x0fe0e Radio     |🔈︎| 0x0fe0e Speaker
|🔍︎| 0x0fe0e Left-poi $|🔒︎| 0x0fe0e Lock      |🔓︎| 0x0fe0e Open Lock |🕐︎| 0x0fe0e Clock Fa $|🕑︎| 0x0fe0e Clock Fa $|🕒︎| 0x0fe0e Clock Fa $
|🕓︎| 0x0fe0e Clock Fa $|🕔︎| 0x0fe0e Clock Fa $|🕕︎| 0x0fe0e Clock Fa $|🕖︎| 0x0fe0e Clock Fa $|🕗︎| 0x0fe0e Clock Fa $|🕘︎| 0x0fe0e Clock Fa $
|🕙︎| 0x0fe0e Clock Fa $|🕚︎| 0x0fe0e Clock Fa $|🕛︎| 0x0fe0e Clock Fa $|🕜︎| 0x0fe0e Clock Fa $|🕝︎| 0x0fe0e Clock Fa $|🕞︎| 0x0fe0e Clock Fa $
|🕟︎| 0x0fe0e Clock Fa $|🕠︎| 0x0fe0e Clock Fa $|🕡︎| 0x0fe0e Clock Fa $|🕢︎| 0x0fe0e Clock Fa $|🕣︎| 0x0fe0e Clock Fa $|🕤︎| 0x0fe0e Clock Fa $
|🕥︎| 0x0fe0e Clock Fa $|🕦︎| 0x0fe0e Clock Fa $|🕧︎| 0x0fe0e Clock Fa $|😐︎| 0x0fe0e Neutral  $|🚇︎| 0x0fe0e Metro     |🚍︎| 0x0fe0e Oncoming $
|🚑︎| 0x0fe0e Ambulance |🚔︎| 0x0fe0e Oncoming $|🚘︎| 0x0fe0e Oncoming $|🚭︎| 0x0fe0e No Smoki $|🚲︎| 0x0fe0e Bicycle   |🚹︎| 0x0fe0e Mens Sym $
|🚺︎| 0x0fe0e Womens S $|🚼︎| 0x0fe0e Baby Sym $

                                     Page 0(END) - [WIDE+VS15+W/VS] - q to quit, [keys: kjfbvc1256w-=]

And here the same table, but without vs-15 (press 'w' key to toggle):

  • bin/wcwidth-browser.py --vs15 --wide=2 --without-vs
                                           Delimiters (|) should align, unicode version is auto.
|--|===================|--|===================|--|===================|--|===================|--|===================|--|===================
|⌚| 0x0231a Watch     |⌛| 0x0231b Hourglass |⏩| 0x023e9 Black Ri $|⏪| 0x023ea Black Le $|⏫| 0x023eb Black Up $|⏬| 0x023ec Black Do $
|⏰| 0x023f0 Alarm Cl $|⏳| 0x023f3 Hourglas $|◽| 0x025fd White Me $|◾| 0x025fe Black Me $|☔| 0x02614 Umbrella $|☕| 0x02615 Hot Beve $
|♈| 0x02648 Aries     |♉| 0x02649 Taurus    |♊| 0x0264a Gemini    |♋| 0x0264b Cancer    |♌| 0x0264c Leo       |♍| 0x0264d Virgo
|♎| 0x0264e Libra     |♏| 0x0264f Scorpius  |♐| 0x02650 Sagittar $|♑| 0x02651 Capricorn |♒| 0x02652 Aquarius  |♓| 0x02653 Pisces
|♿| 0x0267f Wheelcha $|⚓| 0x02693 Anchor    |⚡| 0x026a1 High Vol $|⚪| 0x026aa Medium W $|⚫| 0x026ab Medium B $|⚽| 0x026bd Soccer B $
|⚾| 0x026be Baseball  |⛄| 0x026c4 Snowman  $|⛅| 0x026c5 Sun Behi $|⛎| 0x026ce Ophiuchus |⛔| 0x026d4 No Entry  |⛪| 0x026ea Church
|⛲| 0x026f2 Fountain  |⛳| 0x026f3 Flag In  $|⛵| 0x026f5 Sailboat  |⛺| 0x026fa Tent      |⛽| 0x026fd Fuel Pump |✅| 0x02705 White He $
|✊| 0x0270a Raised F $|✋| 0x0270b Raised H $|✨| 0x02728 Sparkles  |❌| 0x0274c Cross Mark|❎| 0x0274e Negative $|❓| 0x02753 Black Qu $
|❔| 0x02754 White Qu $|❕| 0x02755 White Ex $|❗| 0x02757 Heavy Ex $|➕| 0x02795 Heavy Pl $|➖| 0x02796 Heavy Mi $|➗| 0x02797 Heavy Di $
|➰| 0x027b0 Curly Loop|➿| 0x027bf Double C $|⬛| 0x02b1b Black La $|⬜| 0x02b1c White La $|⭐| 0x02b50 White Me $|⭕| 0x02b55 Heavy La $
|〰| 0x03030 Wavy Dash |〽| 0x0303d Part Alt $|㊗| 0x03297 Circled  $|㊙| 0x03299 Circled  $|🀄| 0x1f004 Mahjong  $|🈂| 0x1f202 Squared  $
|🈚| 0x1f21a Squared  $|🈯| 0x1f22f Squared  $|🈷| 0x1f237 Squared  $|🌍| 0x1f30d Earth Gl $|🌎| 0x1f30e Earth Gl $|🌏| 0x1f30f Earth Gl $
|🌕| 0x1f315 Full Moo $|🌜| 0x1f31c Last Qua $|🍸| 0x1f378 Cocktail $|🎓| 0x1f393 Graduati $|🎧| 0x1f3a7 Headphone |🎬| 0x1f3ac Clapper  $
|🎭| 0x1f3ad Performi $|🎮| 0x1f3ae Video Game|🏂| 0x1f3c2 Snowboar $|🏄| 0x1f3c4 Surfer    |🏆| 0x1f3c6 Trophy    |🏊| 0x1f3ca Swimmer
|🏠| 0x1f3e0 House Bu $|🏭| 0x1f3ed Factory   |🐈| 0x1f408 Cat       |🐕| 0x1f415 Dog       |🐟| 0x1f41f Fish      |🐦| 0x1f426 Bird
|👂| 0x1f442 Ear       |👆| 0x1f446 White Up $|👇| 0x1f447 White Do $|👈| 0x1f448 White Le $|👉| 0x1f449 White Ri $|👍| 0x1f44d Thumbs U $
|👎| 0x1f44e Thumbs D $|👓| 0x1f453 Eyeglasses|👪| 0x1f46a Family    |👽| 0x1f47d Extrater $|💣| 0x1f4a3 Bomb      |💰| 0x1f4b0 Money Bag
|💳| 0x1f4b3 Credit C $|💻| 0x1f4bb Personal $|💿| 0x1f4bf Optical  $|📋| 0x1f4cb Clipboard |📚| 0x1f4da Books     |📟| 0x1f4df Pager
|📤| 0x1f4e4 Outbox T $|📥| 0x1f4e5 Inbox Tray|📦| 0x1f4e6 Package   |📪| 0x1f4ea Closed M $|📫| 0x1f4eb Closed M $|📬| 0x1f4ec Open Mai $
|📭| 0x1f4ed Open Mai $|📷| 0x1f4f7 Camera    |📹| 0x1f4f9 Video Ca $|📺| 0x1f4fa Television|📻| 0x1f4fb Radio     |🔈| 0x1f508 Speaker
|🔍| 0x1f50d Left-poi $|🔒| 0x1f512 Lock      |🔓| 0x1f513 Open Lock |🕐| 0x1f550 Clock Fa $|🕑| 0x1f551 Clock Fa $|🕒| 0x1f552 Clock Fa $
|🕓| 0x1f553 Clock Fa $|🕔| 0x1f554 Clock Fa $|🕕| 0x1f555 Clock Fa $|🕖| 0x1f556 Clock Fa $|🕗| 0x1f557 Clock Fa $|🕘| 0x1f558 Clock Fa $
|🕙| 0x1f559 Clock Fa $|🕚| 0x1f55a Clock Fa $|🕛| 0x1f55b Clock Fa $|🕜| 0x1f55c Clock Fa $|🕝| 0x1f55d Clock Fa $|🕞| 0x1f55e Clock Fa $
|🕟| 0x1f55f Clock Fa $|🕠| 0x1f560 Clock Fa $|🕡| 0x1f561 Clock Fa $|🕢| 0x1f562 Clock Fa $|🕣| 0x1f563 Clock Fa $|🕤| 0x1f564 Clock Fa $
|🕥| 0x1f565 Clock Fa $|🕦| 0x1f566 Clock Fa $|🕧| 0x1f567 Clock Fa $|😐| 0x1f610 Neutral  $|🚇| 0x1f687 Metro     |🚍| 0x1f68d Oncoming $
|🚑| 0x1f691 Ambulance |🚔| 0x1f694 Oncoming $|🚘| 0x1f698 Oncoming $|🚭| 0x1f6ad No Smoki $|🚲| 0x1f6b2 Bicycle   |🚹| 0x1f6b9 Mens Sym $
|🚺| 0x1f6ba Womens S $|🚼| 0x1f6bc Baby Sym $

                                     Page 0(END) - [WIDE+VS15+WO/VS] - q to quit, [keys: kjfbvc1256w-=]

I'll also share what I understand, and please forgive me, this is just top-of-mind with a few links. I do not know much about the japanese language or culture and so I cannot properly research it.

These variations originate from the earliest emojis on Japanese handhelds, and during their evolution, from the late 1980's LCD and into the early 2000's LED color emoji symbols, that their size/DPI and color was also increased.

Japanese is "monospacing friendly", for example, in a web browser of websites containing japanese, such as this tweet, it is surprisingly monospaced until the 3rd line because it includes variable-width english:

image

And this goes to the earliest japanese LCD devices and DOS-era CRT displays, and their frequent mixing of english in technology and software. Usually that "english is 1 cell" and "japanese is 2", like today's Unicode interpretation.

Here is an example from the PC-98 home computer:

image

emojis are detailed and require 2 cells. Later examples, like from 2001 mobile phone where emoji use is now very colorful and decorated. Note the two middle phones may have a strange "2x emoji" size, where 1 emoji is approximately two lines tall and 4 english letter cells wide (eg width and height of mail)! Maybe these variation selectors are related to this, or maybe this is just one specific phone design or even user preference, I can't say.

The outer two phones represent emoji as "2 cells", eg. the man and the beer emoji line up with 2 columns of ascii letters above them. This is pretty typical.

image

I'm certain that emojis were always represented approximately of "2 cells" for a long time. That the color LED technology made them colorful and more detailed. Some articles that include more photos,

image

The unicode point for airplane and clock faces are narrow in EastAsianWidth.txt. Without the variation selector that they default to text style, they are not even necessarily an emoji, they're like some of the other basic shapes and symbols defined by EastAsianWidth.txt as Narrow.

And I think VS-15 serves to fulfill the need to display normally Wide emojis in a matching size and bitonal color, of only these select few early emojis, as if to show it in its historical or simplified form. I believe VS-15 exists for a mirrored property, that turning a Wide emoji into text display makes it Narrow (VS15), just as making text into emoji presentation makes it wide(VS16).

This is the strongest interpretation outside of the specifications that I can make that text representation of emojis are Narrow on Modern displays.


And finally, @j4james's chief complaint appears to be about implementation,

Imagine trying to write a "narrow" text presentation emoji in the bottom right corner of the screen. You'd think it should fit, but the emoji is received before the VS-15 selector, and that doesn't fit, so the terminal is forced to wrap the text, triggering a scroll of the entire page. By the time the VS-15 arrives, there's no way to undo all of that.

I can understand a naive implementation of "a VS-15 make previous cell narrow" is not compatible with the processing engine of a terminal, and that it would be terrible to do that, we agree here completely.

I do suggest to consider that unicode sequences, VS-15, VS-16, and ZWJ, are not very different from terminal sequences, like that once an \x1b is received in input or output, there really is no telling what might come next until any next character is read. And much like the delayed input of KEY_ESCAPE ('\x1b') vs. an application key like KEY_UP ('\x1b[OA'), we really can't know that the escape key alone was pressed until a short timeout has elapsed without any subsequent character.

Perhaps a similar technique can be used, that, only for those wide emojis present in the variation-sequences.txt table that may be followed by VS-15 or VS-16, that display is refrained for a short period for any next character to make its final width known.

For performance maybe you can continue to have "rewinding" logic if that's something you've already done for VS-16 narrow-to-wide support and prefer that logic, only using the "delayed check" logic when the unicode codepoint in question (present in variation-sequences.txt) could have prevented a linewrap or could cause one, if they are naturally Wide or Narrow, and may sometimes be followed by VS-15 or 16 by definition in emoji-variation-sequences.txt

@j4james
Copy link

j4james commented Nov 10, 2025

The text I was referring to in TR-11 regarding VS-16 was from an earlier version. It used to be much more explicit.

[UTS51] emoji presentation sequences behave as though they were East Asian Wide, regardless of their assigned East_Asian_Width property value.

Aside from the practicality, the fact that TR-11 was so explicit about the VS-16 making glyphs wider, while having no mention of VS-15 making glyphs narrow, is (I think) what convinced most terminal devs that VS-15 should only affect the visual appearance, and not the width.

It's not like this topic hasn't been discussed before. It's just that almost everyone has already agreed on the correct behavior. That's why you're seeing the results you're seeing in your tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants