Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-byte characters are split in writeCData() if first byte sits right at the end of the buffer #86

Closed
tatsel opened this issue Jun 5, 2024 · 3 comments
Milestone

Comments

@tatsel
Copy link
Contributor

tatsel commented Jun 5, 2024

This issue seems to be similar to the one fixed here https://github.com/FasterXML/aalto-xml/pull/75/commits, though the cause is a bit different:

I get 'javax.xml.stream.XMLStreamException: Incomplete surrogate pair in content: first char 0xdfce, second 0x78' exception when I try to write CData with multi-byte char sitting right at the border of 512-sized internal buffer.

Example test to reproduce (copied from https://github.com/FasterXML/aalto-xml/blob/master/src/test/java/com/fasterxml/aalto/sax/TestSaxWriter.java#L10 and slightly adjusted for writeCData()):

StringBuilder testText = new StringBuilder();
        for (int i = 0; i < 511; i++)
            testText.append('x');
        testText.append("\uD835\uDFCE");
        for (int i = 0; i < 512; i++)
            testText.append('x');
        WriterConfig writerConfig = new WriterConfig();
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        Utf8XmlWriter writer = new Utf8XmlWriter(writerConfig, byteArrayOutputStream);
        writer.writeStartTagStart(writer.constructName("testelement"));
        writer.writeCData(testText.toString());
        writer.writeStartTagEnd();
        writer.writeEndTag(writer.constructName("testelement"));
        writer.close(false);

I think the reason is that ByteXmlWriter#writeCDataContents() lacks this piece of code which exists in writeCharacters():

if (_surrogate != 0) {
            outputSurrogates(_surrogate, cbuf[offset]);
//           reset the temporary surrogate storage
            _surrogate = 0;
            ++offset;
            --len;
        }
@tatsel tatsel mentioned this issue Jun 5, 2024
@tatsel
Copy link
Contributor Author

tatsel commented Jun 5, 2024

I added the mentioned test and potential fix in this pull-request #87, please check if this is the correct way to address the issue.

@cowtowncoder
Copy link
Member

Thank you for both reporting the issue and providing both reproduction and fix!
I decided to check couple of other cases as well and similar issue affected comments (#91) and processing instructions (#93) too, so fixed those as well.

I can release 1.3.3 soon, but wanted to give you a chance to see if I missed anything with fix, before publishing.

@tatsel
Copy link
Contributor Author

tatsel commented Jun 7, 2024

Thank you for the quick response! I'm glad I helped to find this. There are no more comments from my end, will wait for the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants