-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
https://github.com/dotnet/corefx/issues/34118 demonstrates that while XmlDocument.Save(string) creates BOM-less UTF-8 files in the absence of an encoding attribute, signaling UTF-8 encoding explicitly via an encoding attribute in the XML declaration unexpectedly creates a UTF-8 file with BOM.
This is problematic for two reasons:
-
From a cross-platform perspective: A document with a UTF-8 (pseudo-)BOM (Unicode signature) can cause problems in cross-platform use, because many utilities on Unix-like platforms and, e.g., Java's standard libraries, where many utilities neither expect nor know how to handle such a BOM.
- While the XML standard does mandate that a compliant parser must recognize a UTF-8 BOM, the reality is that XML files are often read as plain-text files.
-
From an internal-consistency perspective: UTF-8 files should be created without BOM, as has been the default since the inception of .NET; specifying UTF-8 explicitly should only produce a BOM if explicitly requested (although the standard does allow such BOMs).
- As an aside: a related intra-.NET inconsistency is that
System.Text.Encoding.UTF8returns an encoding that does produce a BOM, but this unexpected behavior is at least documented.
- As an aside: a related intra-.NET inconsistency is that
@krwq feels that fixing this inconsistency is too much of a breaking change, so the behavior should be documented; to summarize:
When the XmlDocument.Save(string) overload is used:
-
In the absence of an
encodingattribute, the.Save(string)method creates a UTF-8 without BOM, in line with .NET's default and suitable for cross-platform use. -
If a
UTF-8-valuedencodingattribute is present, the.Save(string)method creates a UTF-8-encoded file with BOM.-
Note that it doesn't matter whether a given document was originally read from a file / a string with an explicit
encoding="UTF-8"attribute (the case ofUTF-8doesn't matter) in its XML declaration, or whether a UTF-8encodingattribute was created programmatically viaXmlDocument.CreateXmlDeclaration(). -
@krwq demonstrates a workaround based on explicit creation of an
XmlWriterinstance here.
-
Finally, it's also worth mentioning that using an encoding value that isn't recognized (as one of the default / registered .NET encodings) causes an exception on calling .Save() (but not on reading).