Encoding XML in UTF-8 with .NET

The solution described here was inspired by the blog post found at http://rlacovara.blogspot.com/2011/02/how-to-create-xml-in-c-with-utf-8.html.  It explains how to replace the default UTF-16 encoding with UTF-8.  I have implemented a variation of this.  In addition, a more generic solution is available at http://www.experts-exchange.com/Programming/Languages/C_Sharp/Q_20554526.html. This one (which I have not implemented), allows for variable encoding values for the output.

By default, XML documents produced using C# and the .NET XMLSerializer class are encoded as UTF-16.  I recently needed to change this to the more commonly-used UTF-8, and learned a few things along the way.

The first thing that I discovered (and perhaps should have already known) is that internally .NET stores all string representations as UTF-16.  That is why, if you don’t change the default encoding, the XML is produced as UTF-16.

Next, I found that the Encoding property of the StringWriter class is read-only, so you can interrogate the default encoding (and see that it is in fact UTF-16) but cannot change it. 

As I learned from the blog posts that I referenced above, the solution to changing the default UTF-16 encoding is to subclass the native .NET StringWriter class and override the default Encoding property value.

Following is a solution for producing a UTF-8-encoded XML document.  The “StringWriterUtf8” class is the key to the solution.  It inherits from the native System.IO.StringWriter class and overrides the Encoding property (returning Encoding.UTF8 instead of Encoding.UTF16).  Using an instance of this class as the target for the XML serialization output produces UTF-8 output.

[Serializable]
public class ClassToSerialize
{
   public string ToXml()
   {
       System.Xml.Serialization.XmlSerializer xml = new XmlSerializer(typeof(ClassToSerialize));
       StringWriterUtf8 text = new StringWriterUtf8();
       xml.Serialize(text, this);
       return text.ToString();
   }

   private String _errorMessage = String.Empty;
   public string Message
   {
       get { return _errorMessage; }
       set { _errorMessage = value; }
   }

   private List<string> _citations = new List<string>();
   public List<string> citations
   {
       get { return _citations; }
       set { _citations = value; }
   }
}

// Subclass the StringWriter class and override the default encoding.  This
// allows us to produce XML encoded as UTF-8.
public class StringWriterUtf8 : System.IO.StringWriter
{
   public override Encoding Encoding
   {
       get
       {
           return Encoding.UTF8;
       }
   }
}

Advertisements

3 Responses to Encoding XML in UTF-8 with .NET

  1. Yuriy says:

    Thanks!

  2. Gary says:

    Very clean and very helpful. Thanks!

  3. erhan says:

    thanks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: