Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed Windows Terminal UTF-16 surrogate pair issue #857

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

kazatsuyu
Copy link

@kazatsuyu kazatsuyu commented Jan 10, 2024

Resolve #848

Windows Terminal seems to send the characters represented by surrogate pairs in the following order:

  1. key down event for the upper surrogate
  2. key up event for the upper surrogate
  3. key down event for the lower surrogate
  4. key up event for the lower surrogate

On the other hand, crossterm's surrogate pair handling does not distinguish between key-down/key-up events. This may be reasonable, as there are cases where another terminal sends only key-up events, but this results in an incorrect sequence of input from the Windows Terminal.

This fix uses only the last input in the case of a sequence of upper surrogates and ignores the input of lower surrogates with no upper surrogates; in the Windows Terminal, inputs 1 and 4 above are ignored and a single surrogate pair is created from inputs 2 and 3.

@kazatsuyu kazatsuyu requested a review from TimonPost as a code owner January 10, 2024 17:06
src/event/sys/windows/parse.rs Outdated Show resolved Hide resolved
src/event/sys/windows/parse.rs Show resolved Hide resolved
@kazatsuyu kazatsuyu requested a review from joshka May 16, 2024 12:11
Copy link
Collaborator

@joshka joshka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I'd suggest making the changes mentioned, mostly for reducing maintenance burden.

Comment on lines +377 to +379
const HIGH_SURROGATE_FIRST: u16 = 0xD800;
const LOW_SURROGATE_FIRST: u16 = 0xDC00;
const LOW_SURROGATE_LAST: u16 = 0xDFFF;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constants generally should be at the top of the file

Comment on lines +388 to +412
let ch = handle_surrogate(&mut buf, new);
(buf, ch)
}
assert_eq!(
[
// Continuous high surrogates
testcase(Some(0xD800), 0xDBFF),
// A normal surrogate pair
testcase(Some(0xD800), 0xDC00),
// An empty buffer and a high surrogate
testcase(None, 0xDBFF),
// An empty buffer and a low surrogate
testcase(None, 0xDC00),
],
[
// The newer one will be stored
(Some(0xDBFF), None),
// A valid char will return
(None, Some('\u{10000}')),
// The high surrogate will be stored
(Some(0xDBFF), None),
// The low surrogate will be ignored
(None, None),
]
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify (individual assertions, no testcase function, descriptions on the assertions, arranged in order that makes most sense (empty to full)):

Suggested change
let ch = handle_surrogate(&mut buf, new);
(buf, ch)
}
assert_eq!(
[
// Continuous high surrogates
testcase(Some(0xD800), 0xDBFF),
// A normal surrogate pair
testcase(Some(0xD800), 0xDC00),
// An empty buffer and a high surrogate
testcase(None, 0xDBFF),
// An empty buffer and a low surrogate
testcase(None, 0xDC00),
],
[
// The newer one will be stored
(Some(0xDBFF), None),
// A valid char will return
(None, Some('\u{10000}')),
// The high surrogate will be stored
(Some(0xDBFF), None),
// The low surrogate will be ignored
(None, None),
]
);
let mut buf = None;
let next = handle_surrogate(&mut buf, 0xDC00);
assert_eq!(
(buf, next),
(None, None),
"Unexpected low surrogate is ignored"
);
let mut buf = None;
let next = handle_surrogate(&mut buf, 0xDBFF);
assert_eq!(
(buf, next),
(Some(0xDBFF), None),
"High surrogate is buffered"
);
let mut buf = Some(0xD800);
let next = handle_surrogate(&mut buf, 0xDBFF);
assert_eq!(
(buf, next),
(Some(0xDBFF), None),
"Consecutive high surrogates"
);
let mut buf = Some(0xD800);
let next = handle_surrogate(&mut buf, 0xDC00);
assert_eq!(
(buf, next),
(None, Some('\u{10000}')),
"Valid Surrogate pair"
);

@tasogare3710
Copy link

tasogare3710 commented Aug 1, 2024

I am going to delete my fork, so read this(#787). This issue related how to deal with the issue of an illegal bytes into the input.
I may not be able to help because I don't have time to spend on crossterm.

https://github.com/tasogare3710/crossterm/tree/fix_handling_of_surrogate_pairs
https://github.com/tasogare3710/crossterm/tree/illegal_surrogate_pairs_ppyp

@joshka
Copy link
Collaborator

joshka commented Aug 2, 2024

I am going to delete my fork, so read this(#787). This issue related how to deal with the issue of an illegal bytes into the input. I may not be able to help because I don't have time to spend on crossterm.

https://github.com/tasogare3710/crossterm/tree/fix_handling_of_surrogate_pairs https://github.com/tasogare3710/crossterm/tree/illegal_surrogate_pairs_ppyp

I've pushed a backup copy of your branch to https://github.com/joshka/crossterm/tree/illegal_surrogate_pairs_ppyp (this branch contains both commits), and I'll keep it around, so you should be fine to delete your fork whenever you want. Thanks for the heads up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Surrogate pairs are ignored on Windows Terminal
3 participants