Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf-16 surrogates are not processed properly #75

Open
R-maan opened this issue Jun 19, 2020 · 0 comments
Open

utf-16 surrogates are not processed properly #75

R-maan opened this issue Jun 19, 2020 · 0 comments
Labels
bug Something isn't working

Comments

@R-maan
Copy link
Contributor

R-maan commented Jun 19, 2020

In ion-test equivs/utf8/stringutf8.ion there are utf-16 representation (lines 44 and 50) where they must interpret equal; however, the current implementation fails to process utf-16 values properly in round trip tests.

The root cause might be in tokenizer.go#readString() where we call WriteRune(r). That results in some unknown characters to be returned from the reader, and of course anything done with the string value of that reader can be erroneous, as the data is already malformed.

When we try to read the values for lines 44 and 50 in above file, we get back "0xEF 0xBF 0xBD" for both high surrogate and low surrogate which cannot be translated to a valid utf-8 value later on when we call them in bitstream, hence returning 0xFFFD and that is when the tests fails.

A solution can be passing an encoding parameter to the reader (example in ion-dotnet).

@therapon therapon added the bug Something isn't working label Jun 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants