Monday, September 3, 2012

Solving XML Invalid Character

If you came here you probably encountered something similar to hexedecimal value 0xXX, is an invalid character.

So why are we getting exceptions?

Back in the days of XML 1.0, there was a character range limit:

Char   ::=   #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

But now with XML 1.1 the usage of these characters is only discouraged:

Char   ::=   [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
RestrictedChar   ::=   [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F]

So why am I still getting error messages?
Well, Microsoft didn't update any of their implementations to the new standard and even if you force the version written, it will still throw an exception.

What can I do?
First, make sure you really need these characters in your XML and its not just noise (they are mostly invisible ), check if applications/services which are going to use these XML files will not fail, also, consider encoding these characters to something like base64.

Here's an example on how to write XML 1.1:


StringBuilder sb = new StringBuilder();
XmlWriterSettings xws = new XmlWriterSettings();
xws.Encoding = Encoding.UTF8;
xws.Indent = true;
//Disable character checking
xws.CheckCharacters = false;

System.Xml.XmlWriter xw = System.Xml.XmlWriter.Create(sb, xws);
//write your own header
xw.WriteProcessingInstruction("xml", "version='1.1'");
XElement doc = new XElement("root");
doc.Add(
        new XElement("test", 
                new XAttribute("val", "\x03")));
//use WriteTo instead of Save
doc.WriteTo(xw);
xw.Close();


And an example on how to read XML 1.1:


TextReader tr = new StringReader(xmlstring);
tr.ReadLine(); //skip Version number '1.1' is invalid. exception

XmlReaderSettings xrs = new XmlReaderSettings();
xrs.CheckCharacters = false;
XmlReader xr = XmlReader.Create(tr, xrs);

var xmldoc = XElement.Load(xr);


You can find a test project here:
https://github.com/drorgl/ForBlog/tree/master/XmlIllegalCharacterTests

No comments:

Post a Comment