Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence ']]>' is not escaped in PCDATA #130

Open
FraGag opened this issue Sep 6, 2016 · 5 comments
Open

Sequence ']]>' is not escaped in PCDATA #130

FraGag opened this issue Sep 6, 2016 · 5 comments

Comments

@FraGag
Copy link

FraGag commented Sep 6, 2016

The sequence ]]> must not occur in PCDATA, but xml-rs fails to escape the > in this sequence when emitted with EventWriter in an XmlEvent::Characters. xml-rs correctly gives an error when attempting to parse the erroneous document.

The fault probably lies on xml::escape::escape_str_pcdata, which handles escaping special characters for a PCDATA context. There are two options for resolving this: either systematically escaping > as > or checking for the sequence ]]> specifically and only escape > when it is part of that sequence.

Sample program to reproduce the problem:

extern crate xml;

use xml::reader::EventReader;
use xml::writer::EventWriter;
use xml::writer::events::XmlEvent;

fn main() {
    let mut v = Vec::new();
    {
        let mut ew = EventWriter::new(&mut v);
        ew.write(XmlEvent::start_element("root")).unwrap();
        ew.write(XmlEvent::characters("invalid ]]> invalid")).unwrap();
        ew.write(XmlEvent::end_element()).unwrap();
    }

    let er = EventReader::new(&v[..]);
    for ev in er {
        println!("{:?}", ev);
    }
}

Output:

Ok(StartDocument(1.0, utf-8, None))
Ok(StartElement(root, {"": "", "xml": "http://www.w3.org/XML/1998/namespace", "xmlns": "http://www.w3.org/2000/xmlns/"}))
Err(Error { pos: 1:53, kind: Syntax("Unexpected token: ]]>") })
@netvl
Copy link
Owner

netvl commented Sep 6, 2016

EventWriter was made as simple as possible initially, so it does not check for such invalid sequences. Actually, I'm not sure if it is possible to check for them reliably. For example, you can write two character events sequentially, XmlEvent::characters("]]") and XmlEvent::characters(">"). They will result into an invalid XML document, but I doubt that it would be possible to check for them because it writes directly to the output stream, without explicit buffering, and I do not want to introduce such buffering.

@whitequark
Copy link

For example, you can write two character events sequentially, XmlEvent::characters("]]") and XmlEvent::characters(">"). They will result into an invalid XML document, but I doubt that it would be possible to check for them because it writes directly to the output stream, without explicit buffering, and I do not want to introduce such buffering.

A reliable, lightweight way to check for such sequences is to have a state machine that recognizes the ]]> or -- sequences and is plugged just before the final write to the output stream.

@netvl
Copy link
Owner

netvl commented Apr 24, 2017

Well, a state machine like this is a kind of a buffer, and it would require that the entire output goes through it, code point by code point. I'm not really sure it is worth the complexity it would introduce.

@whitequark
Copy link

Well, a state machine like this is a kind of a buffer

I don't see what about a state machine is like a buffer at all.

I'm not really sure it is worth the complexity it would introduce.

I don't see the complexity you mention, either. The entire thing is one wrapper struct implementing the Write trait. Besides, there can be no argument on whether it's "worth the complexity" while the straighforward way to use the library causes it to produce invalid XML in cases most people will likely not check for.

@jsoo1
Copy link

jsoo1 commented May 31, 2020

Maybe the docs should be updated to warn that closing cdata characters are not escaped. Currently the cdata function gives the impression that the closing characters are escaped:

https://docs.rs/xml-rs/0.8.3/xml/writer/events/enum.XmlEvent.html#method.cdata

@kornelski kornelski self-assigned this May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants