Skip to content

fix XML encoding of text value #200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

fix XML encoding of text value #200

wants to merge 3 commits into from

Conversation

aidarxa
Copy link

@aidarxa aidarxa commented May 10, 2025

Problem: If the object path contains the '&' ampersand, it will not be escaped when forming an XML request, as required by the specification.

The playback path:

  1. Upload an object containing &
  2. Pass this object to std::listminio::s3::DeleteObject objects;
  3. Call Client::RemoveObjects

A message will be received in response: unable to do remove objects; MalformedXML: The XML you provided was not well-formed or did not validate against our published schema. (XML syntax error on line 1: invalid character entity &file (no semicolon)) for the test/test&file object.

This PR escapes the ampersand using the string &.

@harshavardhana
Copy link
Member

This should be handled properly with XML parsing itself, the replace code is not correct.

We need a bit more generic solution that doesn't involve custom changes

@balamurugana
Copy link
Member

@aidarxa What is your server?

@aidarxa
Copy link
Author

aidarxa commented May 10, 2025

This should be handled properly with XML parsing itself, the replace code is not correct.

We need a bit more generic solution that doesn't involve custom changes

Are there any reasons why the current implementation baseclient includes but does not use pugixml for this task?

@aidarxa
Copy link
Author

aidarxa commented May 10, 2025

@aidarxa What is your server?

I'm not sure if I understood the question correctly.

I use MinIO.

@balamurugana
Copy link
Member

balamurugana commented May 10, 2025

@harshavardhana As the minio server is not returning URL encoded values, is it a same behavior like AWS S3? I suspect it would fail for many unsupported characters.

Copy link
Member

@balamurugana balamurugana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better fix (hack) is like below

diff --git a/include/miniocpp/utils.h b/include/miniocpp/utils.h
index a46ea24..a354e0a 100644
--- a/include/miniocpp/utils.h
+++ b/include/miniocpp/utils.h
@@ -95,6 +95,9 @@ std::string Join(const std::vector<std::string>& values,
 // EncodePath does URL encoding of path. It also normalizes multiple slashes.
 std::string EncodePath(const std::string& path);
 
+// XMLEncode does XML encoding of value.
+std::string XMLEncode(const std::string& value);
+
 // Sha256hash computes SHA-256 of data and return hash as hex encoded value.
 std::string Sha256Hash(std::string_view str);
 
diff --git a/src/baseclient.cc b/src/baseclient.cc
index fce46c9..0b46aa4 100644
--- a/src/baseclient.cc
+++ b/src/baseclient.cc
@@ -1459,7 +1459,7 @@ RemoveObjectsResponse BaseClient::RemoveObjects(RemoveObjectsApiArgs args) {
   if (args.quiet) ss << "<Quiet>true</Quiet>";
   for (auto& object : args.objects) {
     ss << "<Object>";
-    ss << "<Key>" << object.name << "</Key>";
+    ss << "<Key>" << utils::XMLEncode(object.name) << "</Key>";
     if (!object.version_id.empty()) {
       ss << "<VersionId>" << object.version_id << "</VersionId>";
     }
diff --git a/src/utils.cc b/src/utils.cc
index 6f16a66..70b62bd 100644
--- a/src/utils.cc
+++ b/src/utils.cc
@@ -56,6 +56,7 @@
 #include <map>
 #include <memory>
 #include <ostream>
+#include <pugixml.hpp>
 #include <regex>
 #include <sstream>
 #include <streambuf>
@@ -222,6 +223,14 @@ std::string EncodePath(const std::string& path) {
   return out;
 }
 
+std::string XMLEncode(const std::string& value) {
+  pugi::xml_document doc;
+  doc.append_child(pugi::node_pcdata).set_value(value);
+  std::ostringstream out;
+  doc.print(out);
+  return out.str();
+}
+
 std::string Sha256Hash(std::string_view str) {
   EVP_MD_CTX* ctx = EVP_MD_CTX_create();
   if (ctx == nullptr) {

The right way to fix is construct xml_document instead of constructing XML manually.

@balamurugana balamurugana changed the title Fixed XML construct on BaseClient::RemoveObjects fix XML encoding of text value May 10, 2025
@aidarxa aidarxa requested a review from balamurugana May 10, 2025 18:32
@aidarxa
Copy link
Author

aidarxa commented May 12, 2025

@balamurugana Can you perform the merge, if everything is ok, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants