Drupal/PHP: User Sessions and Garbage Collection

I recently had to troubleshoot a web site running on the LAMP platform.  I have deep knowledge of the Windows platform technologies, but only a passing knowledge of the technologies on which this particular site was built.

The specific technologies involved were PHP 5.2 (5.2.6), MySQL 5 (5.0.51a), Apache 2.2.9, and Debian Linux (lenny9).  The web site was based on Drupal 6 (6.19).  Everything was running on a single virtualized server.  The problem was that the database backups were taking so much time and consuming so many server resources that the web site was becoming unavailable for 10-20 minutes every night.

Following is a description of what I did to find and correct the problem.  This description will highlight some SQL queries that are useful for troubleshooting MySQL, as well as configuration settings that are key to management of Drupal user sessions and PHP garbage collection.

The original administrator for the site had set up a process to back up the MySQL database, compress the backup file into a ZIP, and move the ZIP file to a secondary network location.  As the site outages corresponded to the times when this backup process was running, it seemed clear that it was the cause of the problems.  The question was why.

The resources allocated to the virtual server, particularly RAM, were somewhat limited.  So an initial guess at the source of the problem was simply that the server was underpowered.

I examined the web server logs and confirmed that the web server was producing HTTP 503 errors during the period of time that the backups were occurring.  Further examination of the web server logs showed that the periods of time that the 503 errors were occurring each night had been gradually increasing over time.  In fact, the site had run for more than 18 months without a problem.  After that, 503 errors began to appear.  These errors had gone unnoticed for about three months.  Simple site monitors were in place to verify the availability of the web site, but those monitors only checked the site every 10 to 15 minutes.  Because of this, the initial site outages were rarely long enough to trigger the site monitors.  After about three months the outages became long enough to consistently trigger the site monitors, and forced my investigation into the source of the problem.

In all of this time the server resources were unchanged, so the theory that the site was underpowered was called into question by the slow-but-steady escalation of the problem.

On the other hand, data was constantly being added to the site, so I began to consider the possibility that the growth of the database was to blame.  It seemed feasible that the backups would take longer as the database grew, as would the corresponding site outages.

I ran the following SQL query on the MySQL server, which returned the size in MB of each database on the server.  It showed that the database for the web site was 5GB in size.

SELECT s.schema_name,  
   SUM(t.data_length + t.index_length)/1024/1024 total_size,  
  (SUM(t.data_length + t.index_length)-SUM(t.data_free))/1024/1024 data_used,
   SUM(data_free)/1024/1024 data_free
FROM INFORMATION_SCHEMA.SCHEMATA s INNER JOIN INFORMATION_SCHEMA.TABLES t
    ON s.schema_name = t.table_schema
GROUP BY s.schema_name
ORDER BY total_size DESC;

5GB seemed an excessively large size for that database, and provided a likely explanation for why the backup operations were taking so long.  To further investigate, I adjusted the SQL query to show the sizes of each individual table on the server.  The new query looked like this…

SELECT s.schema_name, table_name,  
  SUM(t.data_length + t.index_length)/1024/1024 total_size,  
(SUM(t.data_length + t.index_length)-SUM(t.data_free))/1024/1024 data_used, 
  SUM(data_free)/1024/1024 data_free
FROM INFORMATION_SCHEMA.SCHEMATA s INNER JOIN INFORMATION_SCHEMA.TABLES t
    ON s.schema_name = t.table_schema
GROUP BY s.schema_name, table_name
ORDER BY total_size DESC;

… which revealed that the "sessions" table in the database was 4GB.  Ah-ha!  Now the questions centered on the “sessions” table. What was it used for?  And what was a reasonable size for it?

A little research showed that the "sessions" table is a Drupal table that stores user session information for the web site.  If user sessions don’t expire, the “sessions” table grows indefinitely. 

One final SQL query, shown here, revealed that there was data in the “sessions” table that was more than two years old.  Notice the use of the FROM_UNIXTIME function to transform the timestamp values to a human-readable date format.

SELECT uid, sid, hostname,  
  FROM_UNIXTIME(timestamp, ‘%Y-%d-%m %H:%I:%S %X’) AS datecreated, 
  cache, session
from sessions
order by datecreated;

That gave me the explanation for why the problem became worse over time… user sessions were not expiring, which meant that the “sessions” table kept growing, and therefore the database backup and compress operations were taking longer and longer to complete.

So… the final question… why was this happening?  The blog post found at http://success.grownupgeek.com/index.php/2008/08/14/drupal-sessions-table/  held the answer.  It explains that  the settings.php file contains the Drupal settings that control the use of the “sessions” table.  The default settings are as follows:

ini_set(‘session.cache_expire’,     200000);
ini_set(‘session.cache_limiter’,    ‘none’);
ini_set(‘session.cookie_lifetime’,  2000000);
ini_set(‘session.gc_maxlifetime’,   200000);

200000 seconds sets a session length of more than 2 days. The blog post recommended settings for those values that were more like this (45 minute sessions):

ini_set(‘session.cache_expire’,     2700);
ini_set(‘session.cache_limiter’,    ‘none’);
ini_set(‘session.cookie_lifetime’,  0);
ini_set(‘session.gc_maxlifetime’,   2700);

I think that 45 minutes may be a bit short, but it does make sense to reduce the session length from the default settings (which are overlong, in my opinion).

I checked the configuration for the site (the file was located at /var/www/citebank.org/sites/default/settings.php) to see if the settings had been modified from the default. Nope; the site was still running at the default of more than 2 days per user session.  Still, even with the default settings there should not have been data more than two years old in the “sessions” table.

What was going on?  Going back to the blog post, I noticed a user comment that held the answer.  The comment reads “Debian/Ubuntu doesn’t have garbage collection set up by default in the apt-get version of PHP5”.  

Garbage collection is the process that cleans old data from the “sessions” table.  So, if garbage collection wasn’t configured, it would explain why the table contained such old data.  The site I was troubleshooting was running on PHP5 on Debian Linux.  apt-get  is a command line tool for managing software on Linux.  I did not know how (or if there even was a way) to find out if apt-get was used to install PHP, but it seemed likely.

The blog comment went on to describe some additional modifications to the settings.php file.  To ensure that garbage collection is configured, the following two lines need to exist (or be added) to the settings.php file:

ini_set(‘session.gc_probability’,  1);
ini_set(‘session.gc_divisor’,   100);

At this point, I was convinced that I had enough information to correct the problems.  I did have one concern, though.  Given that there was 4GB and millions of rows worth of data in the “sessions” table, I assumed that changing the garbage collection settings would cause Drupal to issue delete statements against all of the rows that were older than the configured length of a session.  That would be bad… millions of logged DELETE operations against a database happening all at once will bring the server to its knees.

I knew that a TRUNCATE TABLE statement issued against a SQL Server table was a quick (re: not logged) way of clearing out a table, but I wasn’t sure if that was also true on MySQL.  A little research showed that the version of MySQL on which the site was running would implement a TRUNCATE TABLE statement by actually dropping and recreating the “sessions” table… quick and easy.  And, no side effects other than a handful of active user sessions being prematurely dropped.  In this case, that was an acceptable side effect.

After having learned all of this, here are the steps that I took to correct the problem.

  1. Edit the settings.php file and modify the existing session variables: session.cache_expire, session.cache_limiter, session_cookie_lifetime, and session.gc_maxlifetime.
  2. Add the additional session variables to settings.php: session.gc_probability and session.gc_divisor.
  3. Issue a “TRUNCATE TABLE sessions” command against the MySQL database.
  4. Restart Apache.

That’s it.  A lot of pain caused by a few missing configuration settings.  Lesson learned:  to avoid unexpected problems, be sure to appropriately configure session lifetime and garbage collection when setting up a web site based on Drupal.

Catching Unhandled Exceptions in ASP.NET

There are various methods that can be used to catch unhandled exceptions in an ASP.NET application.  The appropriate method to use depends on the nature of the exception being thrown.  This post walks through several examples to demonstrate several different types of “unhandled” exceptions and how to catch them.

This investigation into unhandled exceptions was initiated by a stack overflow exception being thrown in a production application.  That type of error proved to be the most challenging “unhandled exception” to handle, not least because of some incomplete or unclear documentation.

The following examples were tested in an ASP.NET WebForms (yes, boring old WebForms) application compiled with ASP.NET 4.0 and hosted with IIS 7.5 on Windows 7.

One type of “unhandled exception” is an exception thrown by a section of code that is not wrapped in a try-catch block.  These types of exceptions can be caught by adding an exception handler to the Application_Error event in the global.asax file of a web application.

Most ASP.NET developers are familiar with the Application_Error event.  The following example shows an implementation of this event handler that catches and logs unhandled exceptions.  Notice that the InnerException of the Exception is what is actually logged.  This is because the original exception is wrapped in an HttpUnhandledException by the time it is caught by the Application_Error event.

void Application_Error(object sender, EventArgs e)
{
   Exception ex = Server.GetLastError();

   // The original error may have been wrapped in a HttpUnhandledException,
   // so we need to log the details of the InnerException.
   ex = ex.InnerException ?? ex;

   try
   {
       // Log the error
       string errMsg = string.Empty;
       if (ex.Message != null) errMsg = "Message:" + ex.Message + "\r\n";
       if (ex.StackTrace != null) errMsg += "Stack Trace:" + ex.StackTrace;
       // * WRITE TO LOG *
               
       Server.ClearError();
   }
   catch
   {
   }

   Response.Redirect("~/Error.aspx?err=" + ex.Message, false);
}

A page with a single button can be used to test the Application_Error error handler.  The code for the button click event is shown here.

protected void btnException_Click(object sender, EventArgs e)
{
   // This will raise an exception, which we won’t handle here
   throw (new Exception("Test Exception"));
}

Another type of unhandled exception is an error (again not wrapped in a try-catch block) that occurs outside the normal request processing context of the ASP.NET runtime.  An example is an error that occurs on another thread.  An HttpModule that registers an event handler for the UnhandledException event of the current AppDomain can be used to catch such exceptions.

Http Modules are assemblies that are called on every request.  In that respect they are similar to ISAPI filters.  Unlike ISAPI filters, they are written in managed code and are integrated with the ASP.NET application life cycle.  ASP.NET itself uses modules to implement features such as forms authentication and caching.  In regards to handling exceptions, the most important feature of http modules is that can consume application events.

The following is the complete code of a class that implements the IHttpModule interface.  It includes an event handler for UnhandledException events.

using System;
using System.Diagnostics;
using System.Globalization;
using System.IO;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading;
using System.Web;

namespace WebMonitor {

   public class UnhandledExceptionModule : IHttpModule
   {
       static int _unhandledExceptionCount = 0;
       static string _sourceName = null;
       static object _initLock = new object();
       static bool _initialized = false;

       public void Init(HttpApplication app)
       {
           // Do this one time for each AppDomain.  Verify that we’re on the correct ASP.NET version and
           // that the EventLog has been properly configured.  If all is well, register an event handler for
           // unhandled exceptions.
           if (!_initialized) {
               lock (_initLock) {
                   if (!_initialized) {
                       string webenginePath = Path.Combine(RuntimeEnvironment.GetRuntimeDirectory(),
                                   "webengine.dll");

                       if (!File.Exists(webenginePath)) {
                           throw new Exception(String.Format(CultureInfo.InvariantCulture,
                               "Failed to locate webengine.dll at ‘{0}’.  This module requires .NET Framework 2.0.",
                               webenginePath));
                       }

                       FileVersionInfo ver = FileVersionInfo.GetVersionInfo(webenginePath);
                       _sourceName = string.Format(CultureInfo.InvariantCulture, "ASP.NET {0}.{1}.{2}.0",
                                                   ver.FileMajorPart, ver.FileMinorPart, ver.FileBuildPart);

                       if (!EventLog.SourceExists(_sourceName)) {
                           throw new Exception(String.Format(CultureInfo.InvariantCulture,
                               "There is no EventLog source named ‘{0}’. Module requires .NET Framework 2.0.",
                               _sourceName));
                       }

                       AppDomain.CurrentDomain.UnhandledException += 
                             new UnhandledExceptionEventHandler(OnUnhandledException);

                       _initialized = true;
                   }
               }
           }
       } 

       void OnUnhandledException(object o, UnhandledExceptionEventArgs e)
       {
           // Let this occur one time for each AppDomain.
           if (Interlocked.Exchange(ref _unhandledExceptionCount, 1) != 0) return;

           // Build a message containing the exception details
           StringBuilder message = new StringBuilder("\r\n\r\nUnhandledException logged by
                     UnhandledExceptionModule.dll:\r\n\r\nappId=");
           string appId = (string) AppDomain.CurrentDomain.GetData(".appId");
           if (appId != null) message.Append(appId);

           Exception currentException = null;
           for (currentException = (Exception)e.ExceptionObject; 
                  currentException != null;
                  currentException = currentException.InnerException) {
               message.AppendFormat("\r\n\r\ntype={0}\r\n\r\nmessage={1}\r\n\r\nstack=\r\n{2}\r\n\r\n",
                                    currentException.GetType().FullName,
                                    currentException.Message,
                                    currentException.StackTrace);
           }          

           // Log the information to the event log
           EventLog Log = new EventLog();
           Log.Source = _sourceName;
           Log.WriteEntry(message.ToString(), EventLogEntryType.Error);
       }
   }
}

To use the HttpModule within a web application, compile it and register the assembly in the web.config file, as shown here.

<system.webServer>
<modules runAllManagedModulesForAllRequests="true">
   <add type="WebMonitor.UnhandledExceptionModule" name="UnhandledExceptionModule"/>
</modules>
</system.webServer>

Testing this error handler is a bit more difficult, because the test needs to show that exceptions that bypass the Application_Error event handler are caught by the HttpModule.  An error needs to be thrown that is not caught by the “normal” ASP.NET error pipeline (for example, the Application_Error event). 

Again start with a single button on a web page.  The click event of the button needs to spawn a thread that throws an exception which is not wrapped in a try-catch block..  Here is the code for the click event.

protected void btnUnhandled_Click(object sender, EventArgs e)
{
   // Queue the task.
   ThreadPool.QueueUserWorkItem(new WaitCallback(ThreadProc));

   // The Sleep gives the background thread time to run
   Thread.Sleep(1000);
}

// This thread procedure performs the task.
static void ThreadProc(Object stateInfo) {
   throw (new Exception("Test Unhandled exception"));
}

Because the exception happens on a separate thread, the Application_Error event does not catch it. However, the HttpModule does.

Note that such an HttpModule exception handler will also catch any exceptions that an Application_Error event handler in global.asax will catch. So, an HttpModule exception handler can be used in tandem with an Application_Error event handler , or in place of the Application_Error event.

The final type of unhandled exception to examine is an exception that corrupts the state of the application.  Probably the best-known example of this is a stack overflow.  Because they require special handling, it might seem that exceptions like a StackOverflowException are simply unhandled exceptions that occur outside the normal request processing context of ASP.NET, just as the error in the previous example.  In fact, exceptions that corrupt the state of the application are a different class of exception entirely, and by definition cannot be caught

This is true despite conflicting documentation that suggests that http modules can catch such errors, or that the legacyUnhandledExceptionPolicy setting in the aspnet.config file (located in the framework folder) can be modified to allow ASP.NET to handle such exceptions in a legacy manner (i.e. like ASP.NET 1.0 and 1.1). 

Furthermore, some documentation suggests that stack overflow errors can be caught if the block of code throwing the error is decorated with the System.Security.SecurityCritical and System.Runtime.ExceptionServices.HandleProcessCorruptedStateExceptions attributes.  (This, of course, assumes that you know the block of code throwing the error.)

The following is the codebehind for a page that generates a stack overflow error by calling a recursive function that never exits.  It illustrates the use of the SecurityCritical and HandleProcessCorruptedStateExceptions attributes that are supposed to allow corrupted state exceptions, including stack overflows, to be caught.  The attributes have no effect; the exceptions are not caught by the try-catch block.

using System;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;

namespace UnhandledExceptionWebApp
{
   // Attribute doesn’t seem to work as advertised
   [System.Security.SecurityCritical]
   public partial class StackOverflow : System.Web.UI.Page
   {
       // Attribute doesn’t seem to work as advertised; the stack overflow is NOT caught 
       [System.Runtime.ExceptionServices.HandleProcessCorruptedStateExceptions]
       protected void Page_Load(object sender, EventArgs e)
       {
           try
           {
               this.Overflow(true);
           }
           catch (Exception ex)
           {
               Response.Redirect("~/Error.aspx?err=" + ex.Message, false);
           }
       }

       // Recursive function causes stack overflow
       private void Overflow(Boolean keepGoing)
       {
           if (keepGoing) this.Overflow(keepGoing);
       }
   }
}

In addition, if this page is added to an application that implements the previously discussed Application_Error and HttpModule event handlers, the stack overflow error is not caught.  Even changing the legacyUnhandledExceptionPolicy setting in the aspnet.config file has no effect.  The stack overflow exception is not caught by any of the error handlers.  It seems that all of the documentation that suggests various methods for capturing stack overflow exceptions is incorrect or misleading.

It appears that there is NO WAY to catch and log a stack overflow error.  So, how can a stack overflow exception be “handled”?

The answer is to use the Debug Diagnostic Tool from Microsoft (the latest version at the time of this writing is 1.2).   This tool includes a debugger service that can capture a dump file when a stack overflow occurs.  That file can then be analyzed to find the code that is causing the stack overflow.

Complete configuration and usage details for the Debug Diagnostic Tool are outside the scope of this post.  In brief, the steps to follow to capture a stack trace when a stack overflow exception occurs are:

  1. Install the Debug Diagnostic Tool
  2. Create a Rule to capture Stack Overflow exceptions and perform a Log Stack Trace action.
  3. Run the web application.
  4. Run the Debug Diagnostic Tool.
  5. Cause the exception to occur.

For more detailed information, see the documentation of the tool here.

When a stack overflow exception occurs, the Debug Diagnostic Tool will capture a stack trace and write it to a log file.  An example of the log contents can be seen here (with the function call that is producing the stack overflow highlighted):

[9/1/2011 11:10:24 PM] First chance exception – 0xc00000fd caused by thread with System ID: 4004
[9/1/2011 11:10:24 PM] Stack Trace
ChildEBP RetAddr Args to Child
WARNING: Frame IP not in any known module. Following frames may be wrong.
03c5300c 00bc0c74 017751b4 00000000 00000001 0xbc0c6e
03c53020 00bc0c74 017751b4 00000000 00000001 0xbc0c74
03c53034 00bc0c74 017751b4 00000000 00000001 0xbc0c74
… (the preceding line repeated many many times)

03c8eccc 00bc0c74 StackOverflowWebApp.StackOverflow.Overflow(Boolean)
03c8ece0 00bc0c74 StackOverflowWebApp.StackOverflow.Overflow(Boolean)
03c8ecf4 00bc0c74 StackOverflowWebApp.StackOverflow.Overflow(Boolean)
… (the preceding line repeated many many times)

03c8ed08 00bc0bb5 StackOverflowWebApp.StackOverflow.Page_Load(System.Object, System.EventArgs)
03c8ed54 0116d5cf System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr, System.Object, System.Object, System.EventArgs)
03c8ed64 5d7d5694 System.Web.Util.CalliEventHandlerDelegateProxy.Callback(System.Object, System.EventArgs)
03c8ed78 5d7c8cbc System.Web.UI.Control.OnLoad(System.EventArgs)
03c8ed8c 5d7c8d1b System.Web.UI.Control.LoadRecursive()
03c8eda4 5d7c66e0 System.Web.UI.Page.ProcessRequestMain(Boolean, Boolean)
03c8efcc 5d7c5cad System.Web.UI.Page.ProcessRequest(Boolean, Boolean)
03c8f004 5d7c5bcf System.Web.UI.Page.ProcessRequest()

The log shows a stack trace which positively identifies the part of the code that is throwing the error (the StackOverflowWebApp.StackOverflow.Overflow(Boolean) method).

Note that the Debug Diagnostic Tool service is set to start automatically.  This may not be desirable, especially if the tool is only needed briefly to debug a particular error.  Also, the tool seems to affect the performance of the web site being debugged.  Use this tool carefully, especially if it must be pointed at a production web site.

In summary, unhandled exceptions in an ASP.NET application can be caught with an Application_Error event handler in the global.asax, or by creating a HttpModule to catch the AppDomain.UnhandledException event.  An HttpModule is required to catch unhandled exceptions that occur outside the normal processing of requests by the ASP.NET runtime.  For errors that corrupt the state of the application, such as stack overflow exceptions, use the Debug Diagnostic Tool to capture a stack trace at the time of the error.

The complete source code for an application that includes all of the examples shown here is available for download.  Please note that the web site should be compiled and hosted under IIS to ensure that the the error handlers will behave properly.  Running the application in debugging mode from within Visual Studio produces different results than you will see in a production environment.  Visual Studio tries to help handle the errors, but that prevents some of the intended event handlers from working as expected, and does not allow for a complete understanding of how the various error handlers work outside the development environment.

St. Louis Day of .NET – jQuery Plug-ins

This is part of a series of posts containing my notes from the sessions I attended at the 2011 St. Louis Day of .NET conference.

This series does not attempt to give complete accounts of the information presented in each session; it is just a way to capture the bullet points, notes, and opinions that I recorded while attending the conference. I have previously posted a list of all of the session materials and sample code that I have been able to find online, so if you are looking for a more precise account of a session, try looking there.

One of the better sessions I attended at this year’s conference was Ian Robinson’s “Building jQuery Plug-Ins”.  Too many sessions I attended this year skipped the how-to-get-started and jumped right into examining the code of a finished product.  This session, on the other hand, stepped through the entire process of building a jQuery plug-in.  Here are my notes from the session:

The Process

  • Wrap business logic with plug-in logic
  • The business logic can stay largely the same
  • Modify to inject settings and context
  • Modify for safety and to play well with others
    First step

Wrap in a closure…

(function($) {
     // ahh… safety
})(jQuery);

This structure ensures that our code doesn’t conflict with others (takes it out of global scope and defines the plug-in’s own scope).  It also ensures that when a dollar sign ($) is used in the body of this code, it means jQuery (and not anything else).

http://nathansjslessons.appspot.com/ (What’s A Closure?)

Define the Plug-in

No selecting needed:

     $.fn.myPlugin = function(options) {…};

Select and return (chain):

     $.fn.myPlugin = function(options) {
          return this.each(function() {
               …
          });
     };

Establish Default Options

     $.fn.myPlugin.defaultOptions = {
          speed: ‘slow’
     };

     var opts = $.extend({}, $.fn.myPlugin.defaultOptions, options);

Inject HTML Context

Options aren’t just for settings.

Inject all context through options.  For example:

     $(‘.nav’).myPlugin({
          speed: ‘fast’,
          subNavSelector: ‘li.subnav’
     });

Events

Determine if you need to enforce document.ready.  If not, let the user of the plug-in do it.

If you’re ever unbinding events, bind with a namespace first.  For example:

     $(‘elem’).unbind(‘click.Namespace’, function(){…});

Complete Example

(function($) {
     $.fn.myPlugin = function(options) {
          var opts = $.extend({}, $.fn.myPlugin.defaultOptions, options), $moduleWrap = this;
          // Iterate over all elements contained in the current context ($moduleWrap = ".myClass"???)
          $moduleWrap.each(function() {
               // Grab the element being evaulated
               var $module = $(this);
               … do something …
          })
     };

     $.fn.myPlugin.defaultOptions = { 
          speed: ‘slow’ 
          subNavSelector: ‘li.subnav’
     };

     $(document).ready(function() {
          $(‘.myClass’).myPlugin();
     });
})(jQuery);

More examples and documentation
http://css-tricks.com/snippets/jquery/jquery-plugin-template/
http://docs.jquery.com/Plugins/Authoring
http://jquery.ian-robinson.com (jQuery Crash Course)

Follow

Get every new post delivered to your Inbox.