-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed Windows Terminal UTF-16 surrogate pair issue #857
base: master
Are you sure you want to change the base?
Fixed Windows Terminal UTF-16 surrogate pair issue #857
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I'd suggest making the changes mentioned, mostly for reducing maintenance burden.
const HIGH_SURROGATE_FIRST: u16 = 0xD800; | ||
const LOW_SURROGATE_FIRST: u16 = 0xDC00; | ||
const LOW_SURROGATE_LAST: u16 = 0xDFFF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constants generally should be at the top of the file
let ch = handle_surrogate(&mut buf, new); | ||
(buf, ch) | ||
} | ||
assert_eq!( | ||
[ | ||
// Continuous high surrogates | ||
testcase(Some(0xD800), 0xDBFF), | ||
// A normal surrogate pair | ||
testcase(Some(0xD800), 0xDC00), | ||
// An empty buffer and a high surrogate | ||
testcase(None, 0xDBFF), | ||
// An empty buffer and a low surrogate | ||
testcase(None, 0xDC00), | ||
], | ||
[ | ||
// The newer one will be stored | ||
(Some(0xDBFF), None), | ||
// A valid char will return | ||
(None, Some('\u{10000}')), | ||
// The high surrogate will be stored | ||
(Some(0xDBFF), None), | ||
// The low surrogate will be ignored | ||
(None, None), | ||
] | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify (individual assertions, no testcase function, descriptions on the assertions, arranged in order that makes most sense (empty to full)):
let ch = handle_surrogate(&mut buf, new); | |
(buf, ch) | |
} | |
assert_eq!( | |
[ | |
// Continuous high surrogates | |
testcase(Some(0xD800), 0xDBFF), | |
// A normal surrogate pair | |
testcase(Some(0xD800), 0xDC00), | |
// An empty buffer and a high surrogate | |
testcase(None, 0xDBFF), | |
// An empty buffer and a low surrogate | |
testcase(None, 0xDC00), | |
], | |
[ | |
// The newer one will be stored | |
(Some(0xDBFF), None), | |
// A valid char will return | |
(None, Some('\u{10000}')), | |
// The high surrogate will be stored | |
(Some(0xDBFF), None), | |
// The low surrogate will be ignored | |
(None, None), | |
] | |
); | |
let mut buf = None; | |
let next = handle_surrogate(&mut buf, 0xDC00); | |
assert_eq!( | |
(buf, next), | |
(None, None), | |
"Unexpected low surrogate is ignored" | |
); | |
let mut buf = None; | |
let next = handle_surrogate(&mut buf, 0xDBFF); | |
assert_eq!( | |
(buf, next), | |
(Some(0xDBFF), None), | |
"High surrogate is buffered" | |
); | |
let mut buf = Some(0xD800); | |
let next = handle_surrogate(&mut buf, 0xDBFF); | |
assert_eq!( | |
(buf, next), | |
(Some(0xDBFF), None), | |
"Consecutive high surrogates" | |
); | |
let mut buf = Some(0xD800); | |
let next = handle_surrogate(&mut buf, 0xDC00); | |
assert_eq!( | |
(buf, next), | |
(None, Some('\u{10000}')), | |
"Valid Surrogate pair" | |
); |
I am going to delete my fork, so read this(#787). This issue related how to deal with the issue of an illegal bytes into the input. https://github.com/tasogare3710/crossterm/tree/fix_handling_of_surrogate_pairs |
I've pushed a backup copy of your branch to https://github.com/joshka/crossterm/tree/illegal_surrogate_pairs_ppyp (this branch contains both commits), and I'll keep it around, so you should be fine to delete your fork whenever you want. Thanks for the heads up. |
Resolve #848
Windows Terminal seems to send the characters represented by surrogate pairs in the following order:
On the other hand, crossterm's surrogate pair handling does not distinguish between key-down/key-up events. This may be reasonable, as there are cases where another terminal sends only key-up events, but this results in an incorrect sequence of input from the Windows Terminal.
This fix uses only the last input in the case of a sequence of upper surrogates and ignores the input of lower surrogates with no upper surrogates; in the Windows Terminal, inputs 1 and 4 above are ignored and a single surrogate pair is created from inputs 2 and 3.