Skip to content

8356165: System.in in jshell replace supplementary characters with ?? #25079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lahodaj
Copy link
Contributor

@lahodaj lahodaj commented May 7, 2025

When reading from System.in in a JShell snippet, JShell first reads the whole line (getting a String), and then converts this characters from this String to bytes on demand. But, it does not convert multi-surrogate code points correctly, it tries to convert each surrogate separately, which cannot work.

The proposal herein is to, when the current character is a high surrogate, peek at the next character, and if it is a low surrogate, convert both the high and low surrogates to bytes together.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8356165: System.in in jshell replace supplementary characters with ?? (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25079/head:pull/25079
$ git checkout pull/25079

Update a local copy of the PR:
$ git checkout pull/25079
$ git pull https://git.openjdk.org/jdk.git pull/25079/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25079

View PR using the GUI difftool:
$ git pr show -t 25079

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25079.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 7, 2025

👋 Welcome back jlahoda! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 7, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the rfr Pull request is ready for review label May 7, 2025
@openjdk
Copy link

openjdk bot commented May 7, 2025

@lahodaj The following label will be automatically applied to this pull request:

  • kulla

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented May 7, 2025

Webrevs

@@ -977,7 +977,15 @@ public void perform(LineReaderImpl in) throws IOException {
public synchronized int readUserInput() throws IOException {
if (pendingBytes == null || pendingBytes.length <= pendingBytesPointer) {
char userChar = readUserInputChar();
pendingBytes = String.valueOf(userChar).getBytes();
StringBuilder dataToConvert = new StringBuilder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, add here the comment from the PR description for readers from the future:

[...] when the current character is a high surrogate, peek at the next character, and if it is a low surrogate, convert both the high and low surrogates to bytes together.

The (internal) API used in the implementation doesn't express that on first sight.

Comment on lines +983 to +986
if (pendingLine.length() > pendingLinePointer &&
Character.isLowSurrogate(pendingLine.charAt(pendingLinePointer))) {
dataToConvert.append(readUserInputChar());
}
Copy link
Contributor

@tats-u tats-u May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about combining readUserInputChar() and (only when not surrogate pair but just isolated code unit) pendingLinePointer--?

pendingLinePointer-- will be unlikely to be happen for normal inputs other than penetration tests.

Comment on lines +52 to +53
inputSink.write("new String(System.in.readNBytes(4))\n\uD83D\uDE03\n");
waitOutput(out, "\"\uD83D\uDE03\"");
Copy link
Contributor

@tats-u tats-u May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following is robuster:

-            inputSink.write("new String(System.in.readNBytes(4))\n\uD83D\uDE03\n");
-            waitOutput(out, "\"\uD83D\uDE03\"");
+            inputSink.write("new String(System.in.readNBytes(5))\n\uD83D\uDE031\n");
+            waitOutput(out, "\"\uD83D\uDE031\"");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kulla [email protected] rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

3 participants