-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support charsets other than UTF-8 #150
base: master
Are you sure you want to change the base?
Conversation
Thanks, I will take a look at this tomorrow or first days of January. Just by scrolling over the code changes your PR looks good. Will check more in detail. Thanks for your contrinution and help 👍 |
if (source == null) { | ||
throw new NullPointerException("source == null"); | ||
} | ||
this.source = source; | ||
this.buffer = source.buffer(); | ||
this.charset = charset; | ||
UNQUOTED_STRING_TERMINALS = ByteString.encodeString(" >/=\n", charset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
public void defaultCharsetUTF8() { | ||
TikXml tikXml = new TikXml.Builder().build(); | ||
|
||
Assert.assertEquals(StandardCharsets.UTF_8, tikXml.config.charset()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet! great test
@@ -435,7 +439,7 @@ public void xmlDeclarationInBeginElementScope() throws IOException { | |||
Assert.fail("Exception expected"); | |||
} catch (IOException e) { | |||
Assert.assertEquals( | |||
"Xml Declatraion <?xml version=\"1.0\" encoding=\"UTF-8\"?> can only be written at the beginning of a xml document! You are not at the beginning of a xml document: current xml scope is ELEMENT_OPENING at path /foo", | |||
"Xml Declaration <?xml version=\"1.0\" encoding=\"UTF-8\"?> can only be written at the beginning of a xml document! You are not at the beginning of a xml document: current xml scope is ELEMENT_OPENING at path /foo", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
.readString(StandardCharsets.ISO_8859_1); | ||
|
||
Assert.assertEquals(expected, actual); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test coverage!
Overall looks good to me, the real question to me though is if we should set the charset in the builder up front or read the XML declaration encoding property: |
I had the same thought, but wasn't sure if that was too out of scope. I think I would approach it by using the default UTF-8 encoding as it already is to read and write, also allow an encoding to be set for reading and writing (like I've done here), then finally only when reading XML if an encoding property exists it would use that encoding instead and disregard whatever was set in the config. The only question is should we allow clients to override the encoding property when reading in case it is incorrect? This could be interpreted as a breaking change as well, since currently all files are read using UTF-8 encoding, whereas this would start reading files using the encoding declared in the file. |
@Bodo1981 what do you think about adding support for this in general? |
No description provided.