Data Access Framework Comparison

Introduction

For some time now I have been working on a project that utilizes a custom-built data access framework, rather than popular ORM frameworks such as Entity Framework or NHibernate.

While the custom framework has worked well for the project, I had questions about it.  For example, it uses stored procedures to implement basic CRUD operations, and I wondered if inline parameterized SQL statements might perform better.  Also, I wondered about the performance of the custom framework compared to the leading ORMs.

Besides my questions about the custom framework, I recognized the importance of having at least a basic understanding of how to use the other ORM frameworks.

In order to answer my questions about the custom framework and to gain some practical experience with the other ORMs, I created a simple web application that uses each of those frameworks to perform basic CRUD applications.  While executing the CRUD operations, the application times them and produces a summary report of the results.

The code for the test application can be found at https://github.com/mlichtenberg/ORMComparison.

NOTE: I assume that most readers are familiar with the basics of Entity Framework and NHibernate, so I will not provide an overview of them here.

Using the custom framework is similar to Entity Framework and NHibernate’s “database-first” approach.  Any project that uses the library references a single assembly containing the base functionality of the library.  A T4 template is used to generate additional classes based on tables in a SQL Server database.  Some of the classes are similar to EF’s Model classes and NHibernate’s Domain classes.  The others provide the basic CRUD functionality for the domain/model classes. 

For these tests I made a second copy of the custom framework classes that provide the basic CRUD functionality, and edited them to replace the CRUD stored procedures with parameterized SQL statements.

The custom framework includes much less overhead on top of ADO.NET than the popular ORMs, so I expected the tests to show that it was the best-performing framework.  The question was, how much better?

In the rest of this post, I will describe the results of my experiment, as well as some of the optimization tips I learned along the way.  Use the following links to jump directly to a topic.

Test Application Overview
“Out-of-the-Box” Performance
Entity Framework Performance After Code Optimization
     AutoDetectChangesEnabled and DetectChanges()
     Recycling the DbContext
NHibernate Performance After Configuration Optimization
     What’s Up with Update Performance in NHibernate?
Results Summary

Test Application Overview

    A SQL Express database was used for the tests.  The data model is borrowed from Microsoft’s Contoso University sample application.  Here is the ER diagram for the database:

image

 

The database was pre-populated with sample data.  The number of rows added to each table were:

Department: 20
Course: 200
Person: 100000
Enrollment: 200000

This was done because SQL Server’s optimizer will behave differently with an empty database than it will with a database containing data, and I wanted the database to respond as it would in a “real-world” situation.  For the tests, all CRUD operations were performed against the Enrollment table.

Five different data access frameworks were tested:

  1. Custom framework with stored procedures
  2. Custom framework with parameterized SQL statements
  3. Entity Framework
  4. NHibernate
  5. Fluent NHibernate

The testing algorithm follows the same pattern for each of the frameworks:

01) Start timer
02) For a user-specified number of iterations 
03)      Submit an INSERT statement to the database
04)      Save the identifier of the new database record
05) End timer
06) Start timer
07) For each new database record identifier
08)      Submit a SELECT statement to the database
09) End timer
10) Start timer
11) For each new database record identifier
12)      Submit an UPDATE statement to the database
13) End timer
14) Start timer
15) For each new database record identifier
16)      Submit a DELETE statement to the database
17) End timer

Note that after the test algorithm completes, the database is in the same state as when the tests began.

To see the actual code, visit https://github.com/mlichtenberg/ORMComparison/blob/master/MVCTestHarness/Controllers/TestController.cs.

"Out-of-the-Box" Performance

I first created very basic tests for each framework. Essentially, these were the “Hello World” versions of the CRUD code for each framework.  No optimization was attempted.

Here is an example of the code that performs the INSERTs for the custom framework.  There is no difference between the version with stored procedures and the version without, other than the namespace from which EnrollmentDAL is instantiated.

    DA.EnrollmentDAL enrollmentDAL = new DA.EnrollmentDAL();

    for (int x = 0; x < Convert.ToInt32(iterations); x++)
    {
        DataObjects.Enrollment enrollment = enrollmentDAL.EnrollmentInsertAuto
            (null, null, 101, 1, null);
        ids.Add(enrollment.EnrollmentID);
    }

      And here is the equivalent code for Entity Framework:

    using (SchoolContext db = new SchoolContext())
    {
       for (int x = 0; x < Convert.ToInt32(iterations); x++)
        {
            Models.Enrollment enrollment = new Models.Enrollment {
                CourseID = 101, StudentID = 1, Grade = null };
            db.Enrollments.Add(enrollment);
            db.SaveChanges();
            ids.Add(enrollment.EnrollmentID);
        }

    }

    The code for NHibernate and Fluent NHibernate is almost identical.  Here is the NHibernate version:

using (var session = NH.NhibernateSession.OpenSession("SchoolContext"))
{
    var course = session.Get<NHDomain.Course>(101);
    var student = session.Get<NHDomain.Person>(1);

    for (int x = 0; x < Convert.ToInt32(iterations); x++)
    {
        var enrollment = new NHDomain.Enrollment { 
            Course = course, Person = student, Grade = null };
        session.SaveOrUpdate(enrollment);

        ids.Add(enrollment.Enrollmentid);
    }

}

The SELECT, UPDATE, and DELETE code for each framework followed similar patterns. 

    NOTE: A SQL Server Profiler trace proved that the actual interactions with the database were the same for each framework.  The same database connections were established, and equivalent CRUD statements were submitted by each framework.  Therefore, any measured differences in performance are due to the overhead of the frameworks themselves.

        Here are the results of the tests of the “out-of-the-box” code:

      Framework              Operation     Elapsed Time (seconds)
      Custom                 Insert        5.9526039
      Custom                 Select        1.9980745
      Custom                 Update        5.0850357
      Custom                 Delete        3.7785886

      Custom (no SPs)        Insert        5.2251725
      Custom (no SPs)        Select        2.0028176
      Custom (no SPs)        Update        4.5381994
      Custom (no SPs)        Delete        3.7064278

      Entity Framework       Insert        1029.5544975
      Entity Framework       Select        8.6153572
      Entity Framework       Update        2362.7183765
      Entity Framework       Delete        25.6118191

      NHibernate             Insert        9.9498188
      NHibernate             Select        7.3306331
      NHibernate             Update        274.7429862
      NHibernate             Delete        12.4241886

      Fluent NHibernate      Insert        11.796126
      Fluent NHibernate      Select        7.3961941
      Fluent NHibernate      Update        283.1575124
      Fluent NHibernate      Delete        10.791648

      NOTE: For all tests, each combination of Framework and Operation was executed 10000 times.   Looking at the first line of the preceding results, this means that Custom framework took 7.45 seconds to perform 10000 INSERTs.

      As you can see, both instances of the the custom framework outperformed Entity Framework and NHibernate.  In addition, the version of the custom framework that used parameterized SQL was very slightly faster than the version that used stored procedures.  Most interesting however, was the performance for INSERT and UPDATE operations.  Entity Framework and both versions of NHibernate were not just worse than the two custom framework versions, they were much MUCH worse.  Clearly, some optimization and/or configuration changes were needed.

      Entity Framework Performance After Code Optimization

      AutoDetectChangesEnabled and DetectChanges()  

      It turns out that much of Entity Framework’s poor performance appears to have been due to the nature of the tests themselves.  Information on Microsoft’s MSDN website notes that if you are tracking a lot of objects in your DbContext object and call methods like Add() and SaveChanges() many times in a loop, your performance may suffer.  That scenario describes the test almost perfectly.

      The solution is to turn off Entity Framework’s automatic detection of changes by setting AutoDetectChangesEnabled to false and explicitly calling DetectChanges().  This instructs Entity Framework to only detect changes to entities when explicitly instructed to do so.  Here is what the updated code for performing INSERTs with Entity Framework looks like (changes highlighted in red):

      using (SchoolContext db = new SchoolContext())
      {
          db.Configuration.AutoDetectChangesEnabled = false;

          for (int x = 0; x < Convert.ToInt32(iterations); x++)
          {
              Models.Enrollment enrollment = new Models.Enrollment {
                  CourseID = 101, StudentID = 1, Grade = null };
              db.Enrollments.Add(enrollment);
              db.ChangeTracker.DetectChanges();
              db.SaveChanges();
              ids.Add(enrollment.EnrollmentID);
          }
      }

      Here are the results of tests with AutoDetectChangesEnabled set to false:

      Framework           Operation    Elapsed Time (seconds)
      Entity Framework    Insert       606.5569332
      Entity Framework    Select       6.4425741
      Entity Framework    Update       605.6206616
      Entity Framework    Delete       21.0813293

      As you can see, INSERT and UPDATE performance improved significantly, and SELECT and DELETE performance also improved slightly.

      Note that turning off AutoDetectChangesEnabled and calling DetectChanges() explicitly in all cases WILL slightly improve the performance of Entity Framework.  However, it could also cause subtle bugs.  Therefore, it is best to only use this optimization technique in very specific scenarios and allow the default behavior otherwise.

      Recycling the DbContext

      While Entity Framework performance certainly improved by changing the AutoDetectChangesEnabled value, it was still relatively poor. 

      Another problem with the tests is that the same DbContext was used for every iteration of an operation (i.e. one DbContext object was used for all 10000 INSERT operations).  This is a problem because the context maintains a record of all entities added to it during its lifetime.  The effect of this was a gradual slowdown of the INSERT (and UPDATE) operations as more and more entities were added to the context.

      Here is what the Entity Framework INSERT code looks like after modifying it to periodically create a new Context (changes highlighted in red):

      for (int x = 0; x < Convert.ToInt32(iterations); x++)
      {
          // Use a new context after every 100 Insert operations
          using (SchoolContext db = new SchoolContext())
          {
              db.Configuration.AutoDetectChangesEnabled = false;

              int count = 1;
              for (int y = x; y < Convert.ToInt32(iterations); y++)
              {
                  Models.Enrollment enrollment = new Models.Enrollment {
                      CourseID = 101, StudentID = 1, Grade = null };
                  db.Enrollments.Add(enrollment);
                  db.ChangeTracker.DetectChanges();
                  db.SaveChanges();
                  ids.Add(enrollment.EnrollmentID);

                  count++;
                  if (count >= 100) break;
                  x++;
              }
          }
      }

      And here are the results of the Entity Framework tests with the additional optimization added:

      Framework            Operation     Elapsed Time (seconds)
      Entity Framework     Insert        14.7847024
      Entity Framework     Select        5.5516514
      Entity Framework     Update        13.823694
      Entity Framework     Delete        10.0770142

      Much better!  The time to perform the SELECT operations was little changed, but the DELETE time was reduced by half, and the INSERT and UPDATE times decreased from a little more than 10 minutes to about 14 seconds.

      NHibernate Performance After Configuration Optimization

      For the NHibernate frameworks, the tests themselves were not the problem.  NHibernate itself needs some tuning. 

      An optimized solution was achieved by changing the configuration settings of the NHibernate Session object.  Here is the definition of the SessionFactory for NHibernate (additions highlighted in red):

      private static ISessionFactory SessionFactory
      {
          get
          {
              if (_sessionFactory == null)
              {
                  string connectionString = ConfigurationManager.ConnectionStrings
                      [_connectionKeyName].ToString();

                  var configuration = new NHConfig.Configuration();
                  configuration.Configure();

                  configuration.SetProperty(NHConfig.Environment.ConnectionString,
                      connectionString);

                  configuration.SetProperty(NHibernate.Cfg.Environment.FormatSql,
                      Boolean.FalseString);
                  configuration.SetProperty
                     (NHibernate.Cfg.Environment.GenerateStatistics,
                          Boolean.FalseString);
                  configuration.SetProperty
                     (NHibernate.Cfg.Environment.Hbm2ddlKeyWords,
                          NHConfig.Hbm2DDLKeyWords.None.ToString());
                  configuration.SetProperty(NHibernate.Cfg.Environment.PrepareSql,
                          Boolean.TrueString);
                  configuration.SetProperty
                      (NHibernate.Cfg.Environment.PropertyBytecodeProvider,
                          "lcg");
                  configuration.SetProperty
                      (NHibernate.Cfg.Environment.PropertyUseReflectionOptimizer,
                          Boolean.TrueString);
                  configuration.SetProperty
                      (NHibernate.Cfg.Environment.QueryStartupChecking,
                          Boolean.FalseString);
                  configuration.SetProperty(NHibernate.Cfg.Environment.ShowSql, 
                      Boolean.FalseString);
                  configuration.SetProperty
                      (NHibernate.Cfg.Environment.UseProxyValidator, 
                          Boolean.FalseString);
                  configuration.SetProperty
                      (NHibernate.Cfg.Environment.UseSecondLevelCache,
                          Boolean.FalseString);

                  configuration.AddAssembly(typeof(Enrollment).Assembly);
                  _sessionFactory = configuration.BuildSessionFactory();
              }
              return _sessionFactory;
          }
      }

      And here is the InitializeSessionFactory method for Fluent NHibernate, with the equivalent changes included:

      private static void InitializeSessionFactory()
      {
          string connectionString = ConfigurationManager.ConnectionStrings[_connectionKeyName]
              .ToString();

          _sessionFactory = Fluently.Configure()
              .Database(MsSqlConfiguration.MsSql2012.ConnectionString(connectionString).ShowSql())
              .Mappings(m => m.FluentMappings.AddFromAssemblyOf<Enrollment>())
              .BuildConfiguration().SetProperty
                  (NHibernate.Cfg.Environment.FormatSql, Boolean.FalseString)
              .SetProperty(NHibernate.Cfg.Environment.GenerateStatistics,
                  Boolean.FalseString)
              .SetProperty(NHibernate.Cfg.Environment.Hbm2ddlKeyWords,
                  NHibernate.Cfg.Hbm2DDLKeyWords.None.ToString())
              .SetProperty(NHibernate.Cfg.Environment.PrepareSql,
                  Boolean.TrueString)
              .SetProperty(NHibernate.Cfg.Environment.PropertyBytecodeProvider,
                  "lcg")
              .SetProperty
                  (NHibernate.Cfg.Environment.PropertyUseReflectionOptimizer,
                      Boolean.TrueString)
              .SetProperty(NHibernate.Cfg.Environment.QueryStartupChecking,
                  Boolean.FalseString)
              .SetProperty(NHibernate.Cfg.Environment.ShowSql, Boolean.FalseString)
              .SetProperty(NHibernate.Cfg.Environment.UseProxyValidator,
                  Boolean.FalseString)
              .SetProperty(NHibernate.Cfg.Environment.UseSecondLevelCache,
                  Boolean.FalseString)
              .BuildSessionFactory();
      }

      The following table gives a brief description of the purpose of these settings:

      Setting                   Purpose
      FormatSql                 Format the SQL before sending it to the database
      GenerateStatistics        Produce statistics on the operations performed
      Hbm2ddlKeyWords           Should NHibernate automatically quote all db object names
      PrepareSql                Compiles the SQL before executing it
      PropertyBytecodeProvider  What bytecode provider to use for the generation of code
      QueryStartupChecking      Check all named queries present in the startup configuration
      ShowSql                   Show the produced SQL
      UseProxyValidator         Validate that mapped entities can be used as proxies
      UseSecondLevelCache       Enable the second level cache

      Notice that several of these (FormatSQL, GenerateStatistics, ShowSQL) are most useful for debugging.  It is not clear why they are enabled by default in NHibernate; it seems to me that these should be opt-in settings, rather than opt-out.

      Here are the results of tests of the NHibernate frameworks with these changes in place:

      Framework                        Operation     Elapsed Time (seconds)
      NHibernate (Optimized)           Insert        5.0894047
      NHibernate (Optimized)           Select        5.2877312
      NHibernate (Optimized)           Update        133.9417387
      NHibernate (Optimized)           Delete        5.6669841

      Fluent NHibernate (Optimized)    Insert        5.0175024
      Fluent NHibernate (Optimized)    Select        5.2698945
      Fluent NHibernate (Optimized)    Update        128.3563561
      Fluent NHibernate (Optimized)    Delete        5.5299521

      These results are much improved, with the INSERT, SELECT, and DELETE operations nearly matching the results achieved by the custom framework.   The UPDATE performance, while improved, is still relatively poor.

      What’s Up with Update Performance in NHibernate?

      The poor update performance is a mystery to me.  I have researched NHibernate optimization techniques and configuration settings, and have searched for other people reporting problems with UPDATE operations.  Unfortunately, I have not been able to find a solution.

      This is disappointing, as I personally found NHibernate more comfortable to work with than Entity Framework, and because it beats or matches the performance of Entity Framework for SELECT, INSERT, and DELETE operations.

      If anyone out there knows of a solution, please leave a comment!

      Final Results

      The following table summarizes the results of the tests using the optimal configuration for each framework.  These are the same results shown earlier in this post, combined here in a single table.

      Framework                        Operation     Elapsed Time (seconds)
      Custom                           Insert        5.9526039
      Custom                           Select        1.9980745
      Custom                           Update        5.0850357
      Custom                           Delete        3.7785886

      Custom (no SPs)                  Insert        5.2251725
      Custom (no SPs)                  Select        2.0028176
      Custom (no SPs)                  Update        4.5381994
      Custom (no SPs)                  Delete        3.7064278

      Entity Framework (Optimized)     Insert        14.7847024
      Entity Framework (Optimized)     Select        5.5516514
      Entity Framework (Optimized)     Update        13.823694
      Entity Framework (Optimized)     Delete        10.0770142

      NHibernate (Optimized)           Insert        5.0894047
      NHibernate (Optimized)           Select        5.2877312
      NHibernate (Optimized)           Update        133.9417387
      NHibernate (Optimized)           Delete        5.6669841

      Fluent NHibernate (Optimized)    Insert        5.0175024
      Fluent NHibernate (Optimized)    Select        5.2698945
      Fluent NHibernate (Optimized)    Update        128.3563561
      Fluent NHibernate (Optimized)    Delete        5.5299521

      And here is a graph showing the same information:

      image

    Advertisements

    Recommended Tool: Express Profiler for SQL Server Databases

    NOTE:  As I was writing up this post I discovered the news that SQL Profiler is deprecated as of the release of SQL Server 2016.  If this also affects the underlying SQL Server tracing APIs, then this news may affect the long-term future of the Express Profiler.  For now, however, it is a tool that I recommend.

    Express Profiler is a simple Open Source alternative to the SQL Profiler that ships with the full SQL Server Management Studio.  This is particularly useful when working with SQL Server Express databases, as the Express version of the Management Studio does NOT include the SQL Profiler.

    Usage of the Express Profiler should be self-explanatory to anyone familiar with the SQL Profiler.

    Here are some details about Express Profiler from the project page:

    • ExpressProfiler (aka SqlExpress Profiler) is a simple and fast replacement for SQL Server Profiler with basic GUI
    • Can be used with both Express and non-Express editions of SQL Server 2005/2008/2008r2/2012/2014 (including LocalDB)
    • Tracing of basic set of events (Batch/RPC/SP:Stmt Starting/Completed, Audit login/logout, User error messages, Blocked Process report) and columns (Event Class, Text Data,Login, CPU, Reads, Writes, Duration, SPID, Start/End time, Database/Object/Application name) – both selectable
    • Filters on most data columns
    • Copy all/selected event rows to clipboard in form of XML
    • Find in "Text data" column
    • Export data in Excel’s clipboard format

    While I have found Express Profiler to be a good and useful tool, it is not as fully-featured as the SQL Profiler.  Here are some key "missing" features in Express Profiler:

    • No way to load a saved trace output, although that feature is on the roadmap for the tool.
    • No way to save trace output directly to a database table.
    • Fewer columns can be included in the trace output, and many fewer events can be traced.  In my experience, however, the columns and events that I find myself using in most cases are all available.
    • As there are fewer columns in the output, there are fewer columns on which to filter.  Again, the most common/useful columns and events are covered.
    • No way to create trace templates for use with future traces.

    Despite these limitations, I recommend this tool for situations where the full SQL Profiler is not available.

    How To Restore a Database in a SQL Server AlwaysOn Availability Group

    Edited March 3, 2017 in response to reader comments.  Thanks for pointing out my mistakes!

    Background

    There are two clustered servers running SQL Server 2014.  The servers host production databases as well as databases used for QA and testing.  One AlwaysOn Availability Group has been created for the production databases and one for the QA databases.  The same server serves as the primary for both AlwaysOn Availability Groups.

    One of the production databases needs to be restored from backup.  Backups are taken from the secondary server, not the primary.  The backups should be restored to the same server from which they were taken.

    The following tasks need to be completed in order to restore the database:

    • Make the secondary server from which the backups were taken the primary server
    • Remove the database to be restored from the AlwaysOn Availability Group
    • Restore the database
    • Add the database back into the Always Availability Group

    Following are detailed instructions for completing these tasks.

    Task 1: Switch the Primary and Secondary Servers

    1) Connect to both servers in the cluster in SQL Server Management Studio.

    2) On the Secondary server, expand the “Availability Groups” folder under the “AlwaysOn High Availability” folder.

    01

    3) Right-click on the availability group containing the database to be restored and select “Failover…” from the context menu.  Click “Next >”.

    02

    4) Select the new primary server.  Click “Next >”.

    03

    5) Verify the choices and click “Finish”.

    04

    05

    6) Repeat steps 3-5 for the remaining availability group.

    Task 1: Remove the Database from the Availability Group

    A restore operation cannot be performed on a database that is part of an availability group, so the next task is to remove the database from the group.

    1) On the primary server, expand the list of Availability Databases for the availability group.

    06

    2) Right-click the database to be restored and select “Remove Database from Availability Group…” from the context-menu.

    07

    3) Click “OK” to remove the database from the availability group.

    08

    Task 2: Restore the Database

    1) In the “Databases” folder on the primary server, right-click on the database to be restored and select “Properties” to open the “Database Properties” dialog.  Select the “Options” page, scroll down to the “Restrict Access” option, and change the value from MULTI_USER to SINGLE_USER.  Click “OK”.

    09

    2) In the “Databases” folder on the primary server, right-click on the database to be restored and select Tasks->Restore->Database… from the context menu.

    3) On the “General” page of the “Restore Database” dialog, select the last Full backup and all Transaction log backups.

    10

    4) Select the “Options” page of the “Restore Database” dialog and click the “Overwrite the existing database (WITH REPLACE)” option.  Click “OK”.

    11

    Task 3: Add the Database Back to the Availability Group

    After the restore of the database to the primary server is complete, it can be put back into the availability group.

    1) In the “Database” folder on the secondary server, right-click the database and select “Delete” from the context menu.  Click “OK”.

    12

    2) Right-click “Availability Databases” in the availability group on the primary server and select “Add Database…” from the context menu.  Click “Next >”.

    13

    3) Select the database to be added to the group and click “Next >”.

    14

    4) Select “Full” as the data synchronization preference.  This will take a full backup of the database on the primary and restore it on the secondary server(s).  Specify a network location accessible to the primary and all secondary servers in which to place the backup files.  Click “Next >”.

    15

    5) Use the “Connect…” button to establish a connection to the secondary server(s).  Click “Next >”.

    16

    6) The “Add Database to Availability Group” wizard will validate all of the settings for the new availability group database.  When it completes, click “Next >”.

    17

    7) Verify the choices and click “Finish” to add the database to the availability group.

    18

    19

    Final Tasks

    The restore of a database in an AlwaysOn Availability Group is now complete.

    At this point it is recommended to immediately perform a backup of the restored database on the secondary server.  This will establish a new backup chain for the database.

    The backup files created during synchronization of the primary and secondary server(s) can be deleted.  The location of those files was specified in Step 4 of the “Add the Database Back to the Availability Group” task.

    Note that the restored database should now be back in MULTI_USER mode.  Recall that it had been set to SINGLE_USER in Step 1 of the “Restore the Database” task.

    Estimating SQL Server Table Sizes

    Microsoft gives guidance for calculating the estimated sizes of your tables and indexes at http://msdn.microsoft.com/en-us/library/ms175991.aspx.  Performing such estimates is an important step in determining required server capacity.

    Unfortunately, Microsoft’s guidance consists of long textual descriptions of the steps that need to be followed to perform the estimations.  It takes some time to interpret the instructions given by Microsoft and perform the calculations.

    With that in mind, I put together a spreadsheet to help me perform the calculations.  To use the spreadsheet, I simply fill out the spreadsheet with the total number of rows in the table and details of the columns (data types and sizes) in the table and/or index.  In my experience, the estimates provided by the spreadsheet are within 5-10% of the actual sizes of the tables/indexes, which has been good enough for my needs.

    You can download the spreadsheet here.

    To complete the spreadsheet, simply enter the requested data into the cells with black borders.  For example, enter the expected number of rows in the table in the cell labeled “Rows in Table’”.  Enter the number of columns in the table in the cell labeled “# of Columns”.  Continue until you have specified all of the information about the table or index.  The spreadsheet uses the information to calculate the total space needed for the table or index.

    To help you get started, the spreadsheet includes sample data for a table that contains 1699488 rows and has six columns.  This table has two integer columns (one of which is the primary key), two datetime columns, one tinyint column, and one nvarchar column.  The average size of the data in the nvarchar column is 12 characters.  Compare these table specifications to the sample data in the spreadsheet for an example of how to map the specifications for your table/index into the spreadsheet.

    Note that while Microsoft provides guidance for estimating sizes of tables, clustered indexes, and non-clustered indexes, this spreadsheet only handles tables and clustered indexes.  I originally intended to add the calculations for non-clustered indexes, but simply did not find the time.  So, I am providing the spreadsheet as-is.  It should, at the very least, provide a good starting point for estimating table and index sizes.

    SQL Server Page Life Expectancy

    Page Life Expectancy (PLE) can be used to measure memory pressure on a SQL Server, though it should be used with some caution.

    Per Microsoft’s documentation, PLE "indicates the number of seconds a page will stay in the buffer pool without references."  In other words, it measures the average amount of time that a page will remain in the memory pool before being flushed to disk.  PLE is measured in seconds.  The lower the value, the more memory pressure on the server.

    The PLE of a SQL Server instance can be checked using the following command:

    SELECT *
    FROM sys.dm_os_performance_counters
    WHERE [object_name] LIKE ‘%Manager%’
    AND [counter_name] = ‘Page life expectancy’

    When examining the output of the query, note the “cntr_value” column; it contains the PLE value.

    Now, some reasons to be cautious when examining the PLE:

    1) Some years ago Microsoft provided some often-quoted (even today) guidance on the recommended threshold for the PLE.  That threshold was based on the hardware common when the guidance was given, and in the vast majority of cases is no longer valid today.  This blog post explains the problem very well, and provides a better way to establish what the PLE threshold should be for your server.  The recommendation given there is to use a threshold of (DataCacheSizeInGB / 4GB *300). 

    As is often the case when tuning and monitoring, the best method is to monitor your server to establish the baseline PLE value during periods of normal usage (and good performance), and to pay attention when the value drops and remains below that baseline.

    2) Many modern servers use Non-Uniform Memory Acess (NUMA), which means that the buffer pool may be split into multiple nodes.  On such systems, PLE may not show the entire picture of what is happening on the server.  PLE is an average across all NUMA nodes, so if only one node is under memory pressure, it may not be clearly reflected in the average value given by the PLE.  More information can be found here.

    3) Memory pressure may be due to inefficient query plans, so don’t immediately assume that the server lacks enough memory to handle the workload.

    Configuring SQL Server Full-Text Search

    Here are step-by-step instructions for configuring tables in a SQL Server database for Full-Text search.  The following process should work for SQL Server 2005, 2008, 2008 R2, and 2012.

    For more information, see Microsoft’s documentation on configuring Full-Text search:

    SQL Server 2005
    SQL Server 2008
    SQL Server 2008 R2
    SQL Server 2012

    STEP 1:  Identify the tables to be full-text indexed. 

    Tables to be full-text indexed must have a unique, single-column, non-nullable index.  For this example, a single table will be indexed.  Here is the DDL for the table:

    CREATE TABLE dbo.Book
    (  
        BookID int IDENTITY(1,1) NOT NULL CONSTRAINT PK_Book PRIMARY KEY,  
        Title nvarchar(2000) NOT NULL DEFAULT(”), 
        Authors nvarchar(2000) NOT NULL DEFAULT(”)
    )
    GO

    — Sample data
    INSERT dbo.Book (Title, Authors)
    VALUES (‘A manual of the British marine Algae’,’Harvey, William H.’)
    INSERT dbo.Book (Title, Authors)
    VALUES (‘A manual of marine zoology for the British Isles.’,’Gosse, Philip Henry,’)
    INSERT dbo.Book (Title, Authors)
    VALUES (‘A history of the British sea-anemones and corals’,’Gosse, Philip Henry,’)
    INSERT dbo.Book (Title, Authors)
    VALUES (‘The life of North American insects’,’Jaeger, Benedict.|Preston, Henry C.’)
    INSERT dbo.Book (Title, Authors)
    VALUES (‘Proceedings of the California Academy of Sciences.’,’California Academy of Sciences.’)
    INSERT dbo.Book (Title, Authors)
    VALUES (‘The romance of natural history’,’Gosse, Philip Henry,’)
    INSERT dbo.Book (Title, Authors)
    VALUES (‘British conchology : or, an account of the Mollusca which now inhabit the British Isles and the surrounding seas’,’Jeffreys, John Gwyn,’)
    INSERT dbo.Book (Title, Authors)
    VALUES (‘Proceedings of the Entomological Society of Philadelphia.’,’Entomological Society of Philadelphia.’)
    INSERT dbo.Book (Title, Authors)
    VALUES (‘Observations on the Pedipalpi of North America’,’Wood, H. C.’)
    INSERT dbo.Book (Title, Authors)
    VALUES (‘On the origin of species, or, The causes of the phenomena of organic nature : a course of six lectures to working men’,’Huxley, Thomas Henry,’)
    GO

    This DDL represents a table that contains data about books.  The first column provides the unique, single-column, non-nullable index for the table, in the form of the table’s primary key.  The Title and Authors columns contain the searchable information about each book.  In a real-world scenario, the table would likely contain additional fields, both searchable and not. 

    A good strategy for full-text indexing is to create a set of denormalized tables just to support full-text search, and accumulate data from the primary relational model into those tables.  This strategy separates tables used for OLTP from tables that support search functionality.  The key to making this strategy effective is deciding on an appropriate method and schedule for syncing the data between the OLTP tables and the search tables.

    STEP 2: Enable the database for full-text indexing (SQL 2005 only)

    In SQL Server 2008 and above, databases are always full-text enabled, so no action is needed.  In SQL 2005, full-text indexing is enabled be default.  If for some reason it is disabled, it can be enabled by executing the following command:

    exec sp_fulltext_database ‘enable’

    STEP 3 (OPTIONAL):  Edit the Noise Words list (SQL 2005 only)

    Noise Words are very common words that, if full-text indexed, decrease the accuracy of full-text searches.  Examples are words like ‘a’, ‘the’, and ‘or’.  SQL Server 2005 ships with default lists of Noise Words for many different languages.

    To view or edit the list, go to the database server and find the appropriate language file (for US English, the default file location is C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\noiseenu.txt).  If editing this file, don’t forget to make a backup!

    For this example, after editing, the content of the file will be:

    $
    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

    This represents the removal of most the contents of the file.  The table being indexed contains data about books, and many “noise” words prove to be significant for books.  For example, with the default Noise Word list, a full-text search for "origin of species" may fail to find Darwin’s best-known work (notice the noise word “of” in that title).  That is why, for this example, the content of the noise word file has been greatly reduced.

    The need to modify the noise word list will vary by application; many times the file will not need to be edited.

    If the noise word file is edited after the full-text catalogs have been created, they will need to be rebuilt.

    STEP 4: Create a full-text catalog

    A full-text catalog is a physical (SQL Server 2005) or logical collection of full-text indexes.

    For SQL Server 2005, use the following command to set up a full-text catalog:

    CREATE FULLTEXT CATALOG FullTextSearchCatalog IN PATH ‘D:\’ 

    The path to the full-text catalog (in this case ‘D;\’) will be unique to the server on which SQL Server is installed.

    Note:  As a best practice, full-text catalogs should be created on their own physical drive.

    In SQL Server 2008 and after, a full-text catalog no longer has a physical implementation.  It is instead a logical concept that represents a group of full-text indexes, so no file paths are necessary.  Therefore, the command for setting up a full-text catalog in SQL Server 2008 and above is simply:

    CREATE FULLTEXT CATALOG FullTextSearchCatalog

    For information about other arguments for this command, look at this article on MSDN.

    STEP 5: Create a full-text index

    Use the CREATE FULLTEXT INDEX command to create a full-text index on a table, as shown here:

    CREATE FULLTEXT INDEX ON dbo.Book
    (  
        Title,  
        Authors  
    )
    KEY INDEX PK_Book ON FullTextSearchCatalog
    GO

    There are several things to notice about this command.  First, the index is being placed into the full-text catalog that was created in the previous step (FullTextSearchCatalog).  Next, the primary key of the table (PK_Book) is used to map table rows to full-text keys.  Finally, all columns to be full-text indexed (Title and Authors) are included in the index definition. 

    STEP 6 (OPTIONAL): Populate the full-text indexes

    This step is considered optional because the creation of the full-text index in the previous step should have automatically started the index population process.  However, this might need to be done again at a later date (if the noise words file is modified, for example).

    To fully populate the full-text index, use this command:

    ALTER FULLTEXT INDEX ON dbo.Book START FULL POPULATION
    GO

    Note the following command will accomplish the same thing:

    ALTER FULLTEXT CATALOG FullTextSearchCatalog REBUILD
    GO

    It is also possible to update full-text indexes incrementally.  See the Microsoft’s documentation for more information about populating full-text indexes.

    The status of index population can be verified by checking the PopulateStatus property of the full-text catalog.  Do this by using the FULLTEXTCATALOGPROPERTY function, as shown here:

    SELECT FULLTEXTCATALOGPROPERTY(‘FullTextSearchCatalog’, ‘Populatestatus’)

    A return value other than "0" indicates that index population operations are in progress.  This can be used to determine if the full-text indexes are online and available to respond to queries.  Prior to issuing a full-text query, a production system might check the full-text catalog status and issue an alternate (non-full-text) query if full-text indexes are offline.

    As of SQL Server 2012, the PopulateStatus full-text catalog property is deprecated.  New development work should instead check the TableFullTextPopulateStatus property of a particular table using the OBJECTPROPERTYEX function.  Again, a return value other than “0” indicates that index population operations are in progress.

    STEP 7 (OPTIONAL): Edit the default Stop List (SQL 2008 and up)

    Stop Lists replace the Noise Word lists used in SQL 2005, and serve the same purpose (see STEP 3 above for more information).  Whereas Noise Word lists require manual editing of a text file on the server on which SQL Server is installed, Stop Lists can be modified via T-SQL scripts and the SQL Management Studio.

    Following are the commands used to configure Noise Word lists on SQL 2008 and later.  First, create a Stop List:

    CREATE FULLTEXT STOPLIST FullTextStopList FROM SYSTEM STOPLIST;

    Once a stop list has been created, the sys.fulltext_stopwords table can be queried to view the contents of the Stop List.  Here is the command for viewing the English words in the Stop List:

    SELECT * FROM sys.fulltext_stopwords WHERE language = ‘English’

    For this example, the English-language Stop List should contain only the entries that we included in the SQL 2005 Noise Words list (STEP 3).  Because this requires removing most of the default words in the Stop List, the easiest way to produce the desired list is to drop all of the default entries and re-add only those that are needed.  Here are the commands to achieve this:

    ALTER FULLTEXT STOPLIST FullTextStopList DROP ALL LANGUAGE ‘English’ ;

    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘$’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘A’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘B’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘C’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘D’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘E’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘F’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘G’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘H’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘I’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘J’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘K’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘L’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘M’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘N’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘O’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘P’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘Q’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘R’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘S’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘T’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘U’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘V’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘W’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘X’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘Y’ LANGUAGE ‘English’ ;
    ALTER FULLTEXT STOPLIST FullTextStopList ADD ‘Z’ LANGUAGE ‘English’ ;

    An important difference from SQL Server 2005 Noise Word lists is that Stop Lists can be applied only to selected full-text indexes.  (SQL Server 2005 Noise Word lists affect the entire server.) 

    To associate the Stop List with a full-text indexed table, use the following command:

    ALTER FULLTEXT INDEX ON dbo .Book SET STOPLIST FullTextStopList;

    After associating a Stop List with a full-text index, the index should be repopulated (repeat STEP 6).

    CONCLUSION

    After completing these steps, it should be possible to submit full-text queries against the full-text indexed table.  Here is one example query that uses the new index:

    SELECT  b.BookID AS ID, b.Title, b.Authors, x.[RANK]
    FROM    CONTAINSTABLE(dbo.Book, (Title, Authors),
                          ‘("manual" and "British") or "Gosse"’) x
            INNER JOIN dbo.Book b ON x.[KEY] = b.BookID

    If everything has been configured correctly, this query will return the following results:

    ID Title                                             Authors             RANK
    1  A manual of the British marine Algae              Harvey, William H.   32
    2  A manual of marine zoology for the British Isles. Gosse, Philip Henry, 48
    3  A history of the British sea-anemones and corals  Gosse, Philip Henry, 48
    6  The romance of natural history                    Gosse, Philip Henry, 48

    Hope this helps!

    SQL Server – Finding the Source of Excessive I/O

    I recently had to troubleshoot a poorly-performing instance of SQL Server.  The cause turned out to be a single stored procedure that was producing excessive disk I/O. 

    Following is a description of the process I used to identify the problem and find its source. 

    This is a summary of the Windows server and SQL Server instance that was experiencing the problem:

    SQL Server 2005 Enterprise edition
    Windows 2008 R2 Standard edition
    23 separate user databases, 7 in use 24/7

    The initial suggestion that something was wrong was slowness in multiple web sites and applications that accessed databases hosted on the server.  These sites and applications all accessed different databases, so the indication was that the problem was affecting the entire server.

    1. First I opened the SQL Server Management Studio Activity Monitor, and examined the four graphs at the top of the monitor.  The graphs of “% Processor Time”, “Waiting Tasks”, and “Batch Requests/sec” were all within the expected operating ranges.  The “Database I/O” graph, on the other hand, showed that I/O was running much higher than normal, and that the high level of I/O was nearly constant.

      Activity Monitor
      The graphs in the Sql Server Management Studio Activity Monitor show server metrics at a glance.

    Note:  It is important that you are familiar with the normal operating ranges for these metrics on your server.  Without that knowledge, it is much harder to identify problems.  Every server is different, so you need to know your server.

    1. Next I used this script to examine the SQL Server wait conditions.  PAGEIOLATCH_SH was higher than normal.  I now had pretty clear indications that excessive I/O was the reason the server was performing poorly.
    2. The first step I took to find the cause of the excessive I/O was to check for abnormal behavior from the applications that were accessing the server.  Primary suspects were two web sites backed by the SQL Server instance, but neither were experiencing any unusual loads.  No other applications were showing signs of unusual database activity.
    3. Since the applications accessing the server checked out, the next step was to look at the server itself.  I opened the Task Manager on the server and confirmed that no unexpected processes (such as backup or antivirus applications) were running.
    4. Still on the server, I clicked on the “Performance” tab of the Task Manager and then clicked the “Resource Monitor…” button to open the Resource Monitor.  The Resource Monitor provides more detailed statistics for CPU, Disk, Network, And Memory resources.  By examining the Disk section, I was able to identify one particular database that was showing unusually high I/O numbers.  For that database, both “Total Bytes/sec” and “Read Bytes/sec” were high. 

      ResMon
      The Windows Resource Monitor gives in-depth details about server resource usage.
       
      I had now narrowed my search for the problem down to one particular database on the server.  In the past, I have seen out-of-date statistics cause unusual query engine behavior… such as simple queries taking much longer to run than they should and consuming far more resources than necessary.  That knowledge led me to the next step.

    5. I executed UPDATE STATISTICS for the major tables in the database that was showing excessive I/O.  However, this had no affect.

      At this point, I suspected that the problem was that the query plan for one stored procedure was bad.  Why?  Simply a hunch based on past experience… I’ve encountered the problem before.  I have seen that one bad procedure query plan can have the same affect as bad statistics.   The tricky part is finding the procedure that is the problem, because procedures and queries across all databases on the server run slow when the machine is starved for resources.

      Note: There are commands like SET STATISTICS IO and SET STATISTICS TIME that help you to tune your SQL procedures and queries, but they are only useful if you know where the problem is.  They are great if you need to tune an individual query/procedure, but they don’t help you find the one procedure out of hundreds that is slowing everything down.

    6. The next step was to start the SQL Server Profiler and do the following…

      a) Set up a trace that watched "RPC:Completed" events.
      b) Set a filter for “Duration” greater than or equal to 5000.  Ideally I would have set filters on “Reads” and “Writes”, but I had no idea what read/write thresholds I was looking for… so instead I chose to look at just the particularly long-running queries.
      c) Set a filter on “Database Name” to view only events in the problem database. 
      d) Collect 10 minutes of data.

      Paying close attention to the I/O statistics (Reads/Writes), I was able to identify one particular procedure that was producing around 1500000 reads each time it was executed.,  This was more than five times the number of reads produced other known "expensive" queries.  In addition, this procedure was being run over and over… as soon as one execution completed, another was started.  This was OK (that was the nature of the application that was invoking the procedure), but it was a further indication that this was the culprit… the procedure was always running.  (Recall that the initial investigation showed high I/O at a nearly constant level.)

      SqlProf
      One procedure (highlighted) was producing 1500000 Reads each time it was executed.

    7. Now that I had identified one procedure as the cause of the excessive I/O, the next task was to tune the procedure.  Query tuning is a topic about which many articles (and books!) have been written, so I won’t cover those details here.  Ultimately I was able to reduce the number of reads and CPU usage of the procedure by a factor of 10.  Success!
    8. The last step was to confirm that I/O usage dropped to normal levels after the modifications to the stored procedure, and to continue monitoring the server to be certain that the problem had been corrected.

    With this process, I was able to successfully identify and correct the cause of excessive I/O and poor performance on SQL Server.  I hope you find this helpful in troubleshooting your own performance problems.