Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML writer / parser does not implement attribute normalization, it is not conformant #97

Open
bubnikv opened this issue May 4, 2022 · 1 comment

Comments

@bubnikv
Copy link

bubnikv commented May 4, 2022

boost::property_tree is able to read the XML attribute values it writes, however it seems to not implement XML attribute normalization:
https://www.w3.org/TR/REC-xml/#AVNormalize

Namely, if a XML attribute value contains newlines or tabs, these newlines or tabs are replaced by libexpat by spaces after parsing and before passing to an application due to the mandatory normalization as shown by the following table from the XML norm:

image

Please note that an attribute type is considered CDATA by default.

And by the way, none of the XML parsers / writers that I checked (boost serialization, tinyxml, lib3mf) seem to implement normalization.

This issue is similar to 3MFConsortium/lib3mf#288

@ljluestc
Copy link

ljluestc commented Feb 8, 2025


#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>
#include <boost/algorithm/string.hpp>
#include <iostream>
#include <sstream>

// Function to normalize XML attribute values
std::string normalize_attribute_value(const std::string& value) {
    std::string normalized_value = value;

    // Replace newlines and tabs with spaces
    boost::replace_all(normalized_value, "\n", " ");
    boost::replace_all(normalized_value, "\t", " ");

    // Replace multiple spaces with a single space
    boost::replace_all(normalized_value, "  ", " ");

    // Trim leading and trailing spaces
    boost::trim(normalized_value);

    return normalized_value;
}

// Custom XML writer settings
struct custom_xml_writer_settings {
    char indent_char;
    int indent_count;
    bool normalize_attributes;

    custom_xml_writer_settings(char indent_char = ' ', int indent_count = 4, bool normalize_attributes = true)
        : indent_char(indent_char), indent_count(indent_count), normalize_attributes(normalize_attributes) {}
};

// Custom XML writer
template <typename Ptree>
void write_xml_custom(std::basic_ostream<typename Ptree::key_type::value_type>& stream,
                      const Ptree& pt,
                      const custom_xml_writer_settings& settings = custom_xml_writer_settings()) {
    // Create a copy of the property tree to normalize attributes
    Ptree normalized_pt = pt;

    if (settings.normalize_attributes) {
        for (auto& node : normalized_pt) {
            for (auto& attr : node.second) {
                if (attr.first == "<xmlattr>") {
                    for (auto& value : attr.second) {
                        value.second.put_value(normalize_attribute_value(value.second.get_value<std::string>()));
                    }
                }
            }
        }
    }

    // Write the normalized property tree to XML
    boost::property_tree::xml_writer_settings<typename Ptree::key_type::value_type> writer_settings(settings.indent_char, settings.indent_count);
    boost::property_tree::write_xml(stream, normalized_pt, writer_settings);
}

int main() {
    try {
        // Create a property tree
        boost::property_tree::ptree pt;

        // Add data with non-normalized attribute values
        pt.put("root.<xmlattr>.attr1", "Hello\tWorld\n");
        pt.put("root.<xmlattr>.attr2", "  Multiple   spaces  ");
        pt.put("root.value", "Some value");

        // Write the property tree to XML with normalization
        std::ostringstream xml_stream;
        write_xml_custom(xml_stream, pt);

        // Print the normalized XML
        std::cout << "Normalized XML:\n" << xml_stream.str() << std::endl;

        // Read the normalized XML back into a property tree
        boost::property_tree::ptree pt2;
        std::istringstream xml_input(xml_stream.str());
        boost::property_tree::read_xml(xml_input, pt2);

        // Print the attribute values to verify normalization
        std::cout << "attr1: " << pt2.get<std::string>("root.<xmlattr>.attr1") << std::endl;
        std::cout << "attr2: " << pt2.get<std::string>("root.<xmlattr>.attr2") << std::endl;
    } catch (const std::exception& e) {
        std::cerr << "Error: " << e.what() << std::endl;
    }

    return 0;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants