When failure is not an option

At one point in time or another, any developer will come across code that isn’t stable, either it’s some poorly supported third party component, or maybe one of your internal developers is a bit of a wild child. In either case, you’re going to need to pave the way for stability, and I’m here to help.

The first key to stability is the obvious try..catch exception handling block, and given you catch a specific enough exception then your code can continue on its way. A bigger issue arises when the code your calling is multi-threaded. While being great for performance, they can be a killer for stability if not dealt with correctly. The problem is that when an exception is raised in a thread and bubbles all the way to the top of call stack, the thread is torn down along with the rest of your application (except in a couple of special cases.) If you’re running an ASP.NET site, one flakey call killing your site is not cool! Fortunately you have a couple of options, the first one I can’t really recommend, but it’s important to know about it.

Option 1: .NET 1.0 / 1.1 Legacy Exception Handling

Back in the olden days of .NET, threads used to have what’s known as a ‘backstop’. That meant any exceptions would be caught and the application would keep running. There’s a couple of obvious downsides to this that made them change how exceptions where handled in .NET 2.0. Firstly, it’s really hard to debug exceptions you don’t know about, and having exceptions disappearing into the void doesn’t help anybody. Secondly, when an exception kills a thread, the chances are your data is going to be left in a corrupt state which is just going to cause issues further down the line. But, given its flaws, you can still revert back to using the backstop in 2.0 with a short piece of XML in your configuration file (just to reiterate, I don’t recommend this approach):

<legacyUnhandledExceptionPolicy enabled="1" />

Option 2: AppDomain to the rescue

Option 2 is a little more complicated then adding an XML element to your configuration file, but it’s also a lot more powerful.

Using an AppDomain makes it possible for a thread to raise an exception without the entire application dying (looks like I’m wrong about this one, threads failing while still terminate the process), also you can be notified of any unhandled exception in any thread within that application domain. Thirdly, you can unload and recreate the AppDomain, meaning you won’t be working with muddied state.

So here’s how you do it in 3 easy steps:

1: Wrap your method up in a class that inherits from MarshalByRefObject.

MarshalByRefObject is what makes life easy when communicating between AppDomains. Inheriting from it will allow you to call methods on classes in the other domain. I’ve included a couple of method stubs that we’ll fill in shortly:

    public sealed class [class]AppDomainMethodCaller[/class] : [class]MarshalByRefObject[/class]
    {
        // Default constructor
        public AppDomainMethodCaller()
        {
        }

        public static void RunUnstableMethod()
        {
            // This method will create the domain to call the unstable code
        }

        private void RunUnstableMethodInternal()
        {
            // This is the method that calls the actual unstable code
        }

    }

2: Add a method to create the object in a new domain.

Now we need to create a new application domain that will call our unstable class. Following is some code that will spin up the AppDomain with the same settings as the current domain, this is important because otherwise the new domain won’t know where to look for our applications libraries or configuration.

        public static void RunUnstableMethod()
        {
            [class]AppDomain[/class] domain = null;
            try
            {
                // Create the domain
                domain = [class]AppDomain[/class].CreateDomain("AppDomainMethodCaller",
                    null,
                    [class]AppDomain[/class].CurrentDomain.SetupInformation);

                // Get the type information for our instance
                [class]Type[/class] targeType = typeof([class]AppDomainMethodCaller[/class]);

                // Create the instance in the remote domain
                [class]AppDomainMethodCaller[/class] instance = 
                    ([class]AppDomainMethodCaller[/class])domain.CreateInstanceAndUnwrap(
                        targeType.Assembly.FullName,
                        targeType.FullName);

                // Run the method
                instance.RunUnstableMethodInternal();

            }
            finally
            {
                // Make sure the remote domain is cleaned up
                if (domain != null)
                    [class]AppDomain[/class].Unload(domain);
            }

        }

3: Don’t forget about the errors!

This is something that’s really important, I don’t think I can stress this enough, don’t forget to log your exceptions. If you don’t and your application fails you won’t know why.

There are two cases that we need to deal with exceptions here. The first is exceptions raised directly, the second is exceptions raised within threads within the domain. The first type can be handled by a catch block in the code that calls the domain, the second can be dealt with by using the AppDomain.UnhandledException event. The next question is what to do with these exceptions, in my example I’m going to raise an event and let the calling code decide.

First of all, define the event handler, are write the code to raise it:

.
        public static event [class]EventHandler[/class]<[class]UnhandledExceptionEventArgs[/class]> UnhandledException;

        private static void OnUnhandledException(object exception, bool isTerminating)
        {
            if (UnhandledException != null)
                UnhandledException(null, new UnhandledExceptionEventArgs(exception, isTerminating));
        }

With that in place we can now add the catch block to our try statement:

            catch ([class]Exception[/class] ex)
            {
                OnUnhandledException(ex, true);
            }

We can define a method that the AppDomain will use to log its exceptions:

        private static void AppDomain_UnhandledException(object sender, [class]UnhandledExceptionEventArgs[/class] e)
        {
            OnUnhandledException(e.ExceptionObject, e.IsTerminating);
        }

And finally add the handler:

                domain.UnhandledException += new [class]UnhandledExceptionEventHandler[/class](AppDomain_UnhandledException);

Now with our class in place we can write a little console app to call our method:

    class [class]Program[/class]
    {
        static void Main(string[] args)
        {
            [class]AppDomainMethodCaller[/class].UnhandledException += 
                new [class]EventHandler[/class]<[class]UnhandledExceptionEventArgs[/class]>(AppDomainMethodCaller_UnhandledException);
            [class]AppDomainMethodCaller[/class].RunUnstableMethod();
            [class]Console[/class].WriteLine("nPress enter to continue.");
            [class]Console[/class].ReadLine();
        }

        static void AppDomainMethodCaller_UnhandledException(object sender, [class]UnhandledExceptionEventArgs[/class] e)
        {
            [class]Debug[/class].Listeners.Add(new [class]ConsoleTraceListener[/class]());
            [class]Debug[/class].WriteLine(e.ExceptionObject.ToString());
        }
    }

And we’re done!

There are a couple of very important things you should know about this code. If you need to keep a reference to the class in the remote app domain you’ll need to keep the app domain alive, or you’ll it’ll throw an exception when you try to reference anything in the class. Also, multiple calls to this method will mean multiple domains, which aren’t all that cheap to create and will consume extra memory, so don’t go crazy with it.

Phil

Links:

2 Responses to “ “When failure is not an option”

  1. matburton says:

    Is this really true?

    “Using an AppDomain makes it possible for a thread to raise an exception without the entire application dying, also you can be notified of any unhandled exception in any thread within that application domain. Thirdly, you can unload and recreate the AppDomain, meaning you won’t be working with muddied state.”

    It seems that an unhanded exception in an AppDomain will trigger the UnhandledException event but the runtime will always subsequently terminate the process

    If I’m wrong I’d love to see an example where a managed thread in a secondary AppDomain throws an unhanded exception and the process recovers by unloading and reloading that AppDomain

  2. Phil says:

    Hey matburton,

    Actually, it does look like I’m wrong on this one, and I’m not able to repeat the process of having a thread in a secondary domain throwing an exception without killing the whole process, so either I’m missing something in this example which I can’t remember, or I’m just wrong 🙂

    I’m going to do a little more research, and I’ll update the post if I figure anything out.

    Thanks,
    Phil