Encoding XML in UTF-8 with .NET

The solution described here was inspired by the blog post found at http://rlacovara.blogspot.com/2011/02/how-to-create-xml-in-c-with-utf-8.html.  It explains how to replace the default UTF-16 encoding with UTF-8.  I have implemented a variation of this.  In addition, a more generic solution is available at http://www.experts-exchange.com/Programming/Languages/C_Sharp/Q_20554526.html. This one (which I have not implemented), allows for variable encoding values for the output.

By default, XML documents produced using C# and the .NET XMLSerializer class are encoded as UTF-16.  I recently needed to change this to the more commonly-used UTF-8, and learned a few things along the way.

The first thing that I discovered (and perhaps should have already known) is that internally .NET stores all string representations as UTF-16.  That is why, if you don’t change the default encoding, the XML is produced as UTF-16.

Next, I found that the Encoding property of the StringWriter class is read-only, so you can interrogate the default encoding (and see that it is in fact UTF-16) but cannot change it. 

As I learned from the blog posts that I referenced above, the solution to changing the default UTF-16 encoding is to subclass the native .NET StringWriter class and override the default Encoding property value.

Following is a solution for producing a UTF-8-encoded XML document.  The “StringWriterUtf8” class is the key to the solution.  It inherits from the native System.IO.StringWriter class and overrides the Encoding property (returning Encoding.UTF8 instead of Encoding.UTF16).  Using an instance of this class as the target for the XML serialization output produces UTF-8 output.

[Serializable]
public class ClassToSerialize
{
   public string ToXml()
   {
       System.Xml.Serialization.XmlSerializer xml = new XmlSerializer(typeof(ClassToSerialize));
       StringWriterUtf8 text = new StringWriterUtf8();
       xml.Serialize(text, this);
       return text.ToString();
   }

   private String _errorMessage = String.Empty;
   public string Message
   {
       get { return _errorMessage; }
       set { _errorMessage = value; }
   }

   private List<string> _citations = new List<string>();
   public List<string> citations
   {
       get { return _citations; }
       set { _citations = value; }
   }
}

// Subclass the StringWriter class and override the default encoding.  This
// allows us to produce XML encoded as UTF-8.
public class StringWriterUtf8 : System.IO.StringWriter
{
   public override Encoding Encoding
   {
       get
       {
           return Encoding.UTF8;
       }
   }
}

Catching Unhandled Exceptions in ASP.NET

There are various methods that can be used to catch unhandled exceptions in an ASP.NET application.  The appropriate method to use depends on the nature of the exception being thrown.  This post walks through several examples to demonstrate several different types of “unhandled” exceptions and how to catch them.

This investigation into unhandled exceptions was initiated by a stack overflow exception being thrown in a production application.  That type of error proved to be the most challenging “unhandled exception” to handle, not least because of some incomplete or unclear documentation.

The following examples were tested in an ASP.NET WebForms (yes, boring old WebForms) application compiled with ASP.NET 4.0 and hosted with IIS 7.5 on Windows 7.

One type of “unhandled exception” is an exception thrown by a section of code that is not wrapped in a try-catch block.  These types of exceptions can be caught by adding an exception handler to the Application_Error event in the global.asax file of a web application.

Most ASP.NET developers are familiar with the Application_Error event.  The following example shows an implementation of this event handler that catches and logs unhandled exceptions.  Notice that the InnerException of the Exception is what is actually logged.  This is because the original exception is wrapped in an HttpUnhandledException by the time it is caught by the Application_Error event.

void Application_Error(object sender, EventArgs e)
{
   Exception ex = Server.GetLastError();

   // The original error may have been wrapped in a HttpUnhandledException,
   // so we need to log the details of the InnerException.
   ex = ex.InnerException ?? ex;

   try
   {
       // Log the error
       string errMsg = string.Empty;
       if (ex.Message != null) errMsg = "Message:" + ex.Message + "\r\n";
       if (ex.StackTrace != null) errMsg += "Stack Trace:" + ex.StackTrace;
       // * WRITE TO LOG *
               
       Server.ClearError();
   }
   catch
   {
   }

   Response.Redirect("~/Error.aspx?err=" + ex.Message, false);
}

A page with a single button can be used to test the Application_Error error handler.  The code for the button click event is shown here.

protected void btnException_Click(object sender, EventArgs e)
{
   // This will raise an exception, which we won’t handle here
   throw (new Exception("Test Exception"));
}

Another type of unhandled exception is an error (again not wrapped in a try-catch block) that occurs outside the normal request processing context of the ASP.NET runtime.  An example is an error that occurs on another thread.  An HttpModule that registers an event handler for the UnhandledException event of the current AppDomain can be used to catch such exceptions.

Http Modules are assemblies that are called on every request.  In that respect they are similar to ISAPI filters.  Unlike ISAPI filters, they are written in managed code and are integrated with the ASP.NET application life cycle.  ASP.NET itself uses modules to implement features such as forms authentication and caching.  In regards to handling exceptions, the most important feature of http modules is that can consume application events.

The following is the complete code of a class that implements the IHttpModule interface.  It includes an event handler for UnhandledException events.

using System;
using System.Diagnostics;
using System.Globalization;
using System.IO;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading;
using System.Web;

namespace WebMonitor {

   public class UnhandledExceptionModule : IHttpModule
   {
       static int _unhandledExceptionCount = 0;
       static string _sourceName = null;
       static object _initLock = new object();
       static bool _initialized = false;

       public void Init(HttpApplication app)
       {
           // Do this one time for each AppDomain.  Verify that we’re on the correct ASP.NET version and
           // that the EventLog has been properly configured.  If all is well, register an event handler for
           // unhandled exceptions.
           if (!_initialized) {
               lock (_initLock) {
                   if (!_initialized) {
                       string webenginePath = Path.Combine(RuntimeEnvironment.GetRuntimeDirectory(),
                                   "webengine.dll");

                       if (!File.Exists(webenginePath)) {
                           throw new Exception(String.Format(CultureInfo.InvariantCulture,
                               "Failed to locate webengine.dll at ‘{0}’.  This module requires .NET Framework 2.0.",
                               webenginePath));
                       }

                       FileVersionInfo ver = FileVersionInfo.GetVersionInfo(webenginePath);
                       _sourceName = string.Format(CultureInfo.InvariantCulture, "ASP.NET {0}.{1}.{2}.0",
                                                   ver.FileMajorPart, ver.FileMinorPart, ver.FileBuildPart);

                       if (!EventLog.SourceExists(_sourceName)) {
                           throw new Exception(String.Format(CultureInfo.InvariantCulture,
                               "There is no EventLog source named ‘{0}’. Module requires .NET Framework 2.0.",
                               _sourceName));
                       }

                       AppDomain.CurrentDomain.UnhandledException += 
                             new UnhandledExceptionEventHandler(OnUnhandledException);

                       _initialized = true;
                   }
               }
           }
       } 

       void OnUnhandledException(object o, UnhandledExceptionEventArgs e)
       {
           // Let this occur one time for each AppDomain.
           if (Interlocked.Exchange(ref _unhandledExceptionCount, 1) != 0) return;

           // Build a message containing the exception details
           StringBuilder message = new StringBuilder("\r\n\r\nUnhandledException logged by
                     UnhandledExceptionModule.dll:\r\n\r\nappId=");
           string appId = (string) AppDomain.CurrentDomain.GetData(".appId");
           if (appId != null) message.Append(appId);

           Exception currentException = null;
           for (currentException = (Exception)e.ExceptionObject; 
                  currentException != null;
                  currentException = currentException.InnerException) {
               message.AppendFormat("\r\n\r\ntype={0}\r\n\r\nmessage={1}\r\n\r\nstack=\r\n{2}\r\n\r\n",
                                    currentException.GetType().FullName,
                                    currentException.Message,
                                    currentException.StackTrace);
           }          

           // Log the information to the event log
           EventLog Log = new EventLog();
           Log.Source = _sourceName;
           Log.WriteEntry(message.ToString(), EventLogEntryType.Error);
       }
   }
}

To use the HttpModule within a web application, compile it and register the assembly in the web.config file, as shown here.

<system.webServer>
<modules runAllManagedModulesForAllRequests="true">
   <add type="WebMonitor.UnhandledExceptionModule" name="UnhandledExceptionModule"/>
</modules>
</system.webServer>

Testing this error handler is a bit more difficult, because the test needs to show that exceptions that bypass the Application_Error event handler are caught by the HttpModule.  An error needs to be thrown that is not caught by the “normal” ASP.NET error pipeline (for example, the Application_Error event). 

Again start with a single button on a web page.  The click event of the button needs to spawn a thread that throws an exception which is not wrapped in a try-catch block..  Here is the code for the click event.

protected void btnUnhandled_Click(object sender, EventArgs e)
{
   // Queue the task.
   ThreadPool.QueueUserWorkItem(new WaitCallback(ThreadProc));

   // The Sleep gives the background thread time to run
   Thread.Sleep(1000);
}

// This thread procedure performs the task.
static void ThreadProc(Object stateInfo) {
   throw (new Exception("Test Unhandled exception"));
}

Because the exception happens on a separate thread, the Application_Error event does not catch it. However, the HttpModule does.

Note that such an HttpModule exception handler will also catch any exceptions that an Application_Error event handler in global.asax will catch. So, an HttpModule exception handler can be used in tandem with an Application_Error event handler , or in place of the Application_Error event.

The final type of unhandled exception to examine is an exception that corrupts the state of the application.  Probably the best-known example of this is a stack overflow.  Because they require special handling, it might seem that exceptions like a StackOverflowException are simply unhandled exceptions that occur outside the normal request processing context of ASP.NET, just as the error in the previous example.  In fact, exceptions that corrupt the state of the application are a different class of exception entirely, and by definition cannot be caught

This is true despite conflicting documentation that suggests that http modules can catch such errors, or that the legacyUnhandledExceptionPolicy setting in the aspnet.config file (located in the framework folder) can be modified to allow ASP.NET to handle such exceptions in a legacy manner (i.e. like ASP.NET 1.0 and 1.1). 

Furthermore, some documentation suggests that stack overflow errors can be caught if the block of code throwing the error is decorated with the System.Security.SecurityCritical and System.Runtime.ExceptionServices.HandleProcessCorruptedStateExceptions attributes.  (This, of course, assumes that you know the block of code throwing the error.)

The following is the codebehind for a page that generates a stack overflow error by calling a recursive function that never exits.  It illustrates the use of the SecurityCritical and HandleProcessCorruptedStateExceptions attributes that are supposed to allow corrupted state exceptions, including stack overflows, to be caught.  The attributes have no effect; the exceptions are not caught by the try-catch block.

using System;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;

namespace UnhandledExceptionWebApp
{
   // Attribute doesn’t seem to work as advertised
   [System.Security.SecurityCritical]
   public partial class StackOverflow : System.Web.UI.Page
   {
       // Attribute doesn’t seem to work as advertised; the stack overflow is NOT caught 
       [System.Runtime.ExceptionServices.HandleProcessCorruptedStateExceptions]
       protected void Page_Load(object sender, EventArgs e)
       {
           try
           {
               this.Overflow(true);
           }
           catch (Exception ex)
           {
               Response.Redirect("~/Error.aspx?err=" + ex.Message, false);
           }
       }

       // Recursive function causes stack overflow
       private void Overflow(Boolean keepGoing)
       {
           if (keepGoing) this.Overflow(keepGoing);
       }
   }
}

In addition, if this page is added to an application that implements the previously discussed Application_Error and HttpModule event handlers, the stack overflow error is not caught.  Even changing the legacyUnhandledExceptionPolicy setting in the aspnet.config file has no effect.  The stack overflow exception is not caught by any of the error handlers.  It seems that all of the documentation that suggests various methods for capturing stack overflow exceptions is incorrect or misleading.

It appears that there is NO WAY to catch and log a stack overflow error.  So, how can a stack overflow exception be “handled”?

The answer is to use the Debug Diagnostic Tool from Microsoft (the latest version at the time of this writing is 1.2).   This tool includes a debugger service that can capture a dump file when a stack overflow occurs.  That file can then be analyzed to find the code that is causing the stack overflow.

Complete configuration and usage details for the Debug Diagnostic Tool are outside the scope of this post.  In brief, the steps to follow to capture a stack trace when a stack overflow exception occurs are:

  1. Install the Debug Diagnostic Tool
  2. Create a Rule to capture Stack Overflow exceptions and perform a Log Stack Trace action.
  3. Run the web application.
  4. Run the Debug Diagnostic Tool.
  5. Cause the exception to occur.

For more detailed information, see the documentation of the tool here.

When a stack overflow exception occurs, the Debug Diagnostic Tool will capture a stack trace and write it to a log file.  An example of the log contents can be seen here (with the function call that is producing the stack overflow highlighted):

[9/1/2011 11:10:24 PM] First chance exception – 0xc00000fd caused by thread with System ID: 4004
[9/1/2011 11:10:24 PM] Stack Trace
ChildEBP RetAddr Args to Child
WARNING: Frame IP not in any known module. Following frames may be wrong.
03c5300c 00bc0c74 017751b4 00000000 00000001 0xbc0c6e
03c53020 00bc0c74 017751b4 00000000 00000001 0xbc0c74
03c53034 00bc0c74 017751b4 00000000 00000001 0xbc0c74
… (the preceding line repeated many many times)

03c8eccc 00bc0c74 StackOverflowWebApp.StackOverflow.Overflow(Boolean)
03c8ece0 00bc0c74 StackOverflowWebApp.StackOverflow.Overflow(Boolean)
03c8ecf4 00bc0c74 StackOverflowWebApp.StackOverflow.Overflow(Boolean)
… (the preceding line repeated many many times)

03c8ed08 00bc0bb5 StackOverflowWebApp.StackOverflow.Page_Load(System.Object, System.EventArgs)
03c8ed54 0116d5cf System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr, System.Object, System.Object, System.EventArgs)
03c8ed64 5d7d5694 System.Web.Util.CalliEventHandlerDelegateProxy.Callback(System.Object, System.EventArgs)
03c8ed78 5d7c8cbc System.Web.UI.Control.OnLoad(System.EventArgs)
03c8ed8c 5d7c8d1b System.Web.UI.Control.LoadRecursive()
03c8eda4 5d7c66e0 System.Web.UI.Page.ProcessRequestMain(Boolean, Boolean)
03c8efcc 5d7c5cad System.Web.UI.Page.ProcessRequest(Boolean, Boolean)
03c8f004 5d7c5bcf System.Web.UI.Page.ProcessRequest()

The log shows a stack trace which positively identifies the part of the code that is throwing the error (the StackOverflowWebApp.StackOverflow.Overflow(Boolean) method).

Note that the Debug Diagnostic Tool service is set to start automatically.  This may not be desirable, especially if the tool is only needed briefly to debug a particular error.  Also, the tool seems to affect the performance of the web site being debugged.  Use this tool carefully, especially if it must be pointed at a production web site.

In summary, unhandled exceptions in an ASP.NET application can be caught with an Application_Error event handler in the global.asax, or by creating a HttpModule to catch the AppDomain.UnhandledException event.  An HttpModule is required to catch unhandled exceptions that occur outside the normal processing of requests by the ASP.NET runtime.  For errors that corrupt the state of the application, such as stack overflow exceptions, use the Debug Diagnostic Tool to capture a stack trace at the time of the error.

The complete source code for an application that includes all of the examples shown here is available for download.  Please note that the web site should be compiled and hosted under IIS to ensure that the the error handlers will behave properly.  Running the application in debugging mode from within Visual Studio produces different results than you will see in a production environment.  Visual Studio tries to help handle the errors, but that prevents some of the intended event handlers from working as expected, and does not allow for a complete understanding of how the various error handlers work outside the development environment.

St. Louis Day of .NET – S.O.L.I.D.

This is part of a series of posts containing my notes from the sessions I attended at the 2011 St. Louis Day of .NET conference.

This series does not attempt to give complete accounts of the information presented in each session; it is just a way to capture the bullet points, notes, and opinions that I recorded while attending the conference. I have previously posted a list of all of the session materials and sample code that I have been able to find online, so if you are looking for a more precise account of a session, try looking there.

My favorite presenter at this year’s conference was Steve Bohlen.  He presented at three session; I attended two: “Taming Dependency Chaos with Inversion of Control Containers” and “Refactoring to a SOLID Foundation”.  Both were excellent.  Following are my notes from the SOLID session.

Single Responsibility Principle

There should never be more than one reason for a class to change.  Each class should do one thing.

Open-Closed Principle

Software Entities (classes, modules, functions, etc) should be open for extension, but closed for modification.

Instead of this:

     public class Report
     {
          public void Print()
          {
          }
     }

Use this:

     public class Report
     {
          public virtual void Print()
          {
          }
     }

So that you can do this:

     public class Report2 : Report
     {
          public override void Print()
          {
          }
     }

In this case, the old working code ("Report" class) still works, and we have also added new functionality ("Report2" class).

Liskov Substitution Principle

Functions that use pointers or references to base classes must be able to use objects of derived classes without knowing it. (Polymorphism; important part highlighted)

Instead of:

     public class LetterReport
     {
          public virtual void Print()
          {
          }
     }

     public class TabloidReport : LetterReport
     {
          public override void Print()
          {
          }
     }

Here, TabloidReport overrides a particular kind of report (LetterReport).

Do this instead:

     public abstract class Report
     {
          public abstract void Print()
          {
          }
     }

     public class LetterReport : Report
     {
          public override void Print()
          {
          }
     }

     public class TabloidReport : Report
     {
          public override void Print()
          {
          }
     }

Now, the base class for the reports is truly generic (it’s not a particular kind of report).

Interface Segregation Principle

Clients should not be forced to depend upon interfaces that they do not use.

Do not build catch-all interfaces like this:

     public interface IDataAccess
     {
          public void SetConnectionString();
          public void Connect();
          public Data GetReportData();
     }

Instead, interfaces should "build upon" other interfaces, as such (interface composition):

     public interface IDataAccess
     {
          public void SetConnectionString();
          public void Connect();
     }

     public interface IReportDataAccess : IDataAccess
     {
          public Data GetReportData();
     }

Now, classes can select the interface that makes the most sense, rather than getting a single interface with everything.

Dependency Inversion Principle

High level modules should not depend on low level modules.  Both should depend on abstractions.  Abstractions should not depend upon details. Details should depend upon abstractions.

This is where dependency injection and object composition comes into play.  No easy code example to give here.

Follow

Get every new post delivered to your Inbox.