Data Access Framework Comparison

Introduction

For some time now I have been working on a project that utilizes a custom-built data access framework, rather than popular ORM frameworks such as Entity Framework or NHibernate.

While the custom framework has worked well for the project, I had questions about it.  For example, it uses stored procedures to implement basic CRUD operations, and I wondered if inline parameterized SQL statements might perform better.  Also, I wondered about the performance of the custom framework compared to the leading ORMs.

Besides my questions about the custom framework, I recognized the importance of having at least a basic understanding of how to use the other ORM frameworks.

In order to answer my questions about the custom framework and to gain some practical experience with the other ORMs, I created a simple web application that uses each of those frameworks to perform basic CRUD applications.  While executing the CRUD operations, the application times them and produces a summary report of the results.

The code for the test application can be found at https://github.com/mlichtenberg/ORMComparison.

NOTE: I assume that most readers are familiar with the basics of Entity Framework and NHibernate, so I will not provide an overview of them here.

Using the custom framework is similar to Entity Framework and NHibernate’s “database-first” approach.  Any project that uses the library references a single assembly containing the base functionality of the library.  A T4 template is used to generate additional classes based on tables in a SQL Server database.  Some of the classes are similar to EF’s Model classes and NHibernate’s Domain classes.  The others provide the basic CRUD functionality for the domain/model classes. 

For these tests I made a second copy of the custom framework classes that provide the basic CRUD functionality, and edited them to replace the CRUD stored procedures with parameterized SQL statements.

The custom framework includes much less overhead on top of ADO.NET than the popular ORMs, so I expected the tests to show that it was the best-performing framework.  The question was, how much better?

In the rest of this post, I will describe the results of my experiment, as well as some of the optimization tips I learned along the way.  Use the following links to jump directly to a topic.

Test Application Overview
“Out-of-the-Box” Performance
Entity Framework Performance After Code Optimization
     AutoDetectChangesEnabled and DetectChanges()
     Recycling the DbContext
NHibernate Performance After Configuration Optimization
     What’s Up with Update Performance in NHibernate?
Results Summary

Test Application Overview

    A SQL Express database was used for the tests.  The data model is borrowed from Microsoft’s Contoso University sample application.  Here is the ER diagram for the database:

image

 

The database was pre-populated with sample data.  The number of rows added to each table were:

Department: 20
Course: 200
Person: 100000
Enrollment: 200000

This was done because SQL Server’s optimizer will behave differently with an empty database than it will with a database containing data, and I wanted the database to respond as it would in a “real-world” situation.  For the tests, all CRUD operations were performed against the Enrollment table.

Five different data access frameworks were tested:

  1. Custom framework with stored procedures
  2. Custom framework with parameterized SQL statements
  3. Entity Framework
  4. NHibernate
  5. Fluent NHibernate

The testing algorithm follows the same pattern for each of the frameworks:

01) Start timer
02) For a user-specified number of iterations 
03)      Submit an INSERT statement to the database
04)      Save the identifier of the new database record
05) End timer
06) Start timer
07) For each new database record identifier
08)      Submit a SELECT statement to the database
09) End timer
10) Start timer
11) For each new database record identifier
12)      Submit an UPDATE statement to the database
13) End timer
14) Start timer
15) For each new database record identifier
16)      Submit a DELETE statement to the database
17) End timer

Note that after the test algorithm completes, the database is in the same state as when the tests began.

To see the actual code, visit https://github.com/mlichtenberg/ORMComparison/blob/master/MVCTestHarness/Controllers/TestController.cs.

"Out-of-the-Box" Performance

I first created very basic tests for each framework. Essentially, these were the “Hello World” versions of the CRUD code for each framework.  No optimization was attempted.

Here is an example of the code that performs the INSERTs for the custom framework.  There is no difference between the version with stored procedures and the version without, other than the namespace from which EnrollmentDAL is instantiated.

    DA.EnrollmentDAL enrollmentDAL = new DA.EnrollmentDAL();

    for (int x = 0; x < Convert.ToInt32(iterations); x++)
    {
        DataObjects.Enrollment enrollment = enrollmentDAL.EnrollmentInsertAuto
            (null, null, 101, 1, null);
        ids.Add(enrollment.EnrollmentID);
    }

      And here is the equivalent code for Entity Framework:

    using (SchoolContext db = new SchoolContext())
    {
       for (int x = 0; x < Convert.ToInt32(iterations); x++)
        {
            Models.Enrollment enrollment = new Models.Enrollment {
                CourseID = 101, StudentID = 1, Grade = null };
            db.Enrollments.Add(enrollment);
            db.SaveChanges();
            ids.Add(enrollment.EnrollmentID);
        }

    }

    The code for NHibernate and Fluent NHibernate is almost identical.  Here is the NHibernate version:

using (var session = NH.NhibernateSession.OpenSession("SchoolContext"))
{
    var course = session.Get<NHDomain.Course>(101);
    var student = session.Get<NHDomain.Person>(1);

    for (int x = 0; x < Convert.ToInt32(iterations); x++)
    {
        var enrollment = new NHDomain.Enrollment { 
            Course = course, Person = student, Grade = null };
        session.SaveOrUpdate(enrollment);

        ids.Add(enrollment.Enrollmentid);
    }

}

The SELECT, UPDATE, and DELETE code for each framework followed similar patterns. 

    NOTE: A SQL Server Profiler trace proved that the actual interactions with the database were the same for each framework.  The same database connections were established, and equivalent CRUD statements were submitted by each framework.  Therefore, any measured differences in performance are due to the overhead of the frameworks themselves.

        Here are the results of the tests of the “out-of-the-box” code:

      Framework              Operation     Elapsed Time (seconds)
      Custom                 Insert        5.9526039
      Custom                 Select        1.9980745
      Custom                 Update        5.0850357
      Custom                 Delete        3.7785886

      Custom (no SPs)        Insert        5.2251725
      Custom (no SPs)        Select        2.0028176
      Custom (no SPs)        Update        4.5381994
      Custom (no SPs)        Delete        3.7064278

      Entity Framework       Insert        1029.5544975
      Entity Framework       Select        8.6153572
      Entity Framework       Update        2362.7183765
      Entity Framework       Delete        25.6118191

      NHibernate             Insert        9.9498188
      NHibernate             Select        7.3306331
      NHibernate             Update        274.7429862
      NHibernate             Delete        12.4241886

      Fluent NHibernate      Insert        11.796126
      Fluent NHibernate      Select        7.3961941
      Fluent NHibernate      Update        283.1575124
      Fluent NHibernate      Delete        10.791648

      NOTE: For all tests, each combination of Framework and Operation was executed 10000 times.   Looking at the first line of the preceding results, this means that Custom framework took 7.45 seconds to perform 10000 INSERTs.

      As you can see, both instances of the the custom framework outperformed Entity Framework and NHibernate.  In addition, the version of the custom framework that used parameterized SQL was very slightly faster than the version that used stored procedures.  Most interesting however, was the performance for INSERT and UPDATE operations.  Entity Framework and both versions of NHibernate were not just worse than the two custom framework versions, they were much MUCH worse.  Clearly, some optimization and/or configuration changes were needed.

      Entity Framework Performance After Code Optimization

      AutoDetectChangesEnabled and DetectChanges()  

      It turns out that much of Entity Framework’s poor performance appears to have been due to the nature of the tests themselves.  Information on Microsoft’s MSDN website notes that if you are tracking a lot of objects in your DbContext object and call methods like Add() and SaveChanges() many times in a loop, your performance may suffer.  That scenario describes the test almost perfectly.

      The solution is to turn off Entity Framework’s automatic detection of changes by setting AutoDetectChangesEnabled to false and explicitly calling DetectChanges().  This instructs Entity Framework to only detect changes to entities when explicitly instructed to do so.  Here is what the updated code for performing INSERTs with Entity Framework looks like (changes highlighted in red):

      using (SchoolContext db = new SchoolContext())
      {
          db.Configuration.AutoDetectChangesEnabled = false;

          for (int x = 0; x < Convert.ToInt32(iterations); x++)
          {
              Models.Enrollment enrollment = new Models.Enrollment {
                  CourseID = 101, StudentID = 1, Grade = null };
              db.Enrollments.Add(enrollment);
              db.ChangeTracker.DetectChanges();
              db.SaveChanges();
              ids.Add(enrollment.EnrollmentID);
          }
      }

      Here are the results of tests with AutoDetectChangesEnabled set to false:

      Framework           Operation    Elapsed Time (seconds)
      Entity Framework    Insert       606.5569332
      Entity Framework    Select       6.4425741
      Entity Framework    Update       605.6206616
      Entity Framework    Delete       21.0813293

      As you can see, INSERT and UPDATE performance improved significantly, and SELECT and DELETE performance also improved slightly.

      Note that turning off AutoDetectChangesEnabled and calling DetectChanges() explicitly in all cases WILL slightly improve the performance of Entity Framework.  However, it could also cause subtle bugs.  Therefore, it is best to only use this optimization technique in very specific scenarios and allow the default behavior otherwise.

      Recycling the DbContext

      While Entity Framework performance certainly improved by changing the AutoDetectChangesEnabled value, it was still relatively poor. 

      Another problem with the tests is that the same DbContext was used for every iteration of an operation (i.e. one DbContext object was used for all 10000 INSERT operations).  This is a problem because the context maintains a record of all entities added to it during its lifetime.  The effect of this was a gradual slowdown of the INSERT (and UPDATE) operations as more and more entities were added to the context.

      Here is what the Entity Framework INSERT code looks like after modifying it to periodically create a new Context (changes highlighted in red):

      for (int x = 0; x < Convert.ToInt32(iterations); x++)
      {
          // Use a new context after every 100 Insert operations
          using (SchoolContext db = new SchoolContext())
          {
              db.Configuration.AutoDetectChangesEnabled = false;

              int count = 1;
              for (int y = x; y < Convert.ToInt32(iterations); y++)
              {
                  Models.Enrollment enrollment = new Models.Enrollment {
                      CourseID = 101, StudentID = 1, Grade = null };
                  db.Enrollments.Add(enrollment);
                  db.ChangeTracker.DetectChanges();
                  db.SaveChanges();
                  ids.Add(enrollment.EnrollmentID);

                  count++;
                  if (count >= 100) break;
                  x++;
              }
          }
      }

      And here are the results of the Entity Framework tests with the additional optimization added:

      Framework            Operation     Elapsed Time (seconds)
      Entity Framework     Insert        14.7847024
      Entity Framework     Select        5.5516514
      Entity Framework     Update        13.823694
      Entity Framework     Delete        10.0770142

      Much better!  The time to perform the SELECT operations was little changed, but the DELETE time was reduced by half, and the INSERT and UPDATE times decreased from a little more than 10 minutes to about 14 seconds.

      NHibernate Performance After Configuration Optimization

      For the NHibernate frameworks, the tests themselves were not the problem.  NHibernate itself needs some tuning. 

      An optimized solution was achieved by changing the configuration settings of the NHibernate Session object.  Here is the definition of the SessionFactory for NHibernate (additions highlighted in red):

      private static ISessionFactory SessionFactory
      {
          get
          {
              if (_sessionFactory == null)
              {
                  string connectionString = ConfigurationManager.ConnectionStrings
                      [_connectionKeyName].ToString();

                  var configuration = new NHConfig.Configuration();
                  configuration.Configure();

                  configuration.SetProperty(NHConfig.Environment.ConnectionString,
                      connectionString);

                  configuration.SetProperty(NHibernate.Cfg.Environment.FormatSql,
                      Boolean.FalseString);
                  configuration.SetProperty
                     (NHibernate.Cfg.Environment.GenerateStatistics,
                          Boolean.FalseString);
                  configuration.SetProperty
                     (NHibernate.Cfg.Environment.Hbm2ddlKeyWords,
                          NHConfig.Hbm2DDLKeyWords.None.ToString());
                  configuration.SetProperty(NHibernate.Cfg.Environment.PrepareSql,
                          Boolean.TrueString);
                  configuration.SetProperty
                      (NHibernate.Cfg.Environment.PropertyBytecodeProvider,
                          "lcg");
                  configuration.SetProperty
                      (NHibernate.Cfg.Environment.PropertyUseReflectionOptimizer,
                          Boolean.TrueString);
                  configuration.SetProperty
                      (NHibernate.Cfg.Environment.QueryStartupChecking,
                          Boolean.FalseString);
                  configuration.SetProperty(NHibernate.Cfg.Environment.ShowSql, 
                      Boolean.FalseString);
                  configuration.SetProperty
                      (NHibernate.Cfg.Environment.UseProxyValidator, 
                          Boolean.FalseString);
                  configuration.SetProperty
                      (NHibernate.Cfg.Environment.UseSecondLevelCache,
                          Boolean.FalseString);

                  configuration.AddAssembly(typeof(Enrollment).Assembly);
                  _sessionFactory = configuration.BuildSessionFactory();
              }
              return _sessionFactory;
          }
      }

      And here is the InitializeSessionFactory method for Fluent NHibernate, with the equivalent changes included:

      private static void InitializeSessionFactory()
      {
          string connectionString = ConfigurationManager.ConnectionStrings[_connectionKeyName]
              .ToString();

          _sessionFactory = Fluently.Configure()
              .Database(MsSqlConfiguration.MsSql2012.ConnectionString(connectionString).ShowSql())
              .Mappings(m => m.FluentMappings.AddFromAssemblyOf<Enrollment>())
              .BuildConfiguration().SetProperty
                  (NHibernate.Cfg.Environment.FormatSql, Boolean.FalseString)
              .SetProperty(NHibernate.Cfg.Environment.GenerateStatistics,
                  Boolean.FalseString)
              .SetProperty(NHibernate.Cfg.Environment.Hbm2ddlKeyWords,
                  NHibernate.Cfg.Hbm2DDLKeyWords.None.ToString())
              .SetProperty(NHibernate.Cfg.Environment.PrepareSql,
                  Boolean.TrueString)
              .SetProperty(NHibernate.Cfg.Environment.PropertyBytecodeProvider,
                  "lcg")
              .SetProperty
                  (NHibernate.Cfg.Environment.PropertyUseReflectionOptimizer,
                      Boolean.TrueString)
              .SetProperty(NHibernate.Cfg.Environment.QueryStartupChecking,
                  Boolean.FalseString)
              .SetProperty(NHibernate.Cfg.Environment.ShowSql, Boolean.FalseString)
              .SetProperty(NHibernate.Cfg.Environment.UseProxyValidator,
                  Boolean.FalseString)
              .SetProperty(NHibernate.Cfg.Environment.UseSecondLevelCache,
                  Boolean.FalseString)
              .BuildSessionFactory();
      }

      The following table gives a brief description of the purpose of these settings:

      Setting                   Purpose
      FormatSql                 Format the SQL before sending it to the database
      GenerateStatistics        Produce statistics on the operations performed
      Hbm2ddlKeyWords           Should NHibernate automatically quote all db object names
      PrepareSql                Compiles the SQL before executing it
      PropertyBytecodeProvider  What bytecode provider to use for the generation of code
      QueryStartupChecking      Check all named queries present in the startup configuration
      ShowSql                   Show the produced SQL
      UseProxyValidator         Validate that mapped entities can be used as proxies
      UseSecondLevelCache       Enable the second level cache

      Notice that several of these (FormatSQL, GenerateStatistics, ShowSQL) are most useful for debugging.  It is not clear why they are enabled by default in NHibernate; it seems to me that these should be opt-in settings, rather than opt-out.

      Here are the results of tests of the NHibernate frameworks with these changes in place:

      Framework                        Operation     Elapsed Time (seconds)
      NHibernate (Optimized)           Insert        5.0894047
      NHibernate (Optimized)           Select        5.2877312
      NHibernate (Optimized)           Update        133.9417387
      NHibernate (Optimized)           Delete        5.6669841

      Fluent NHibernate (Optimized)    Insert        5.0175024
      Fluent NHibernate (Optimized)    Select        5.2698945
      Fluent NHibernate (Optimized)    Update        128.3563561
      Fluent NHibernate (Optimized)    Delete        5.5299521

      These results are much improved, with the INSERT, SELECT, and DELETE operations nearly matching the results achieved by the custom framework.   The UPDATE performance, while improved, is still relatively poor.

      What’s Up with Update Performance in NHibernate?

      The poor update performance is a mystery to me.  I have researched NHibernate optimization techniques and configuration settings, and have searched for other people reporting problems with UPDATE operations.  Unfortunately, I have not been able to find a solution.

      This is disappointing, as I personally found NHibernate more comfortable to work with than Entity Framework, and because it beats or matches the performance of Entity Framework for SELECT, INSERT, and DELETE operations.

      If anyone out there knows of a solution, please leave a comment!

      Final Results

      The following table summarizes the results of the tests using the optimal configuration for each framework.  These are the same results shown earlier in this post, combined here in a single table.

      Framework                        Operation     Elapsed Time (seconds)
      Custom                           Insert        5.9526039
      Custom                           Select        1.9980745
      Custom                           Update        5.0850357
      Custom                           Delete        3.7785886

      Custom (no SPs)                  Insert        5.2251725
      Custom (no SPs)                  Select        2.0028176
      Custom (no SPs)                  Update        4.5381994
      Custom (no SPs)                  Delete        3.7064278

      Entity Framework (Optimized)     Insert        14.7847024
      Entity Framework (Optimized)     Select        5.5516514
      Entity Framework (Optimized)     Update        13.823694
      Entity Framework (Optimized)     Delete        10.0770142

      NHibernate (Optimized)           Insert        5.0894047
      NHibernate (Optimized)           Select        5.2877312
      NHibernate (Optimized)           Update        133.9417387
      NHibernate (Optimized)           Delete        5.6669841

      Fluent NHibernate (Optimized)    Insert        5.0175024
      Fluent NHibernate (Optimized)    Select        5.2698945
      Fluent NHibernate (Optimized)    Update        128.3563561
      Fluent NHibernate (Optimized)    Delete        5.5299521

      And here is a graph showing the same information:

      image

    Advertisements

    Recommended Tool: Express Profiler for SQL Server Databases

    NOTE:  As I was writing up this post I discovered the news that SQL Profiler is deprecated as of the release of SQL Server 2016.  If this also affects the underlying SQL Server tracing APIs, then this news may affect the long-term future of the Express Profiler.  For now, however, it is a tool that I recommend.

    Express Profiler is a simple Open Source alternative to the SQL Profiler that ships with the full SQL Server Management Studio.  This is particularly useful when working with SQL Server Express databases, as the Express version of the Management Studio does NOT include the SQL Profiler.

    Usage of the Express Profiler should be self-explanatory to anyone familiar with the SQL Profiler.

    Here are some details about Express Profiler from the project page:

    • ExpressProfiler (aka SqlExpress Profiler) is a simple and fast replacement for SQL Server Profiler with basic GUI
    • Can be used with both Express and non-Express editions of SQL Server 2005/2008/2008r2/2012/2014 (including LocalDB)
    • Tracing of basic set of events (Batch/RPC/SP:Stmt Starting/Completed, Audit login/logout, User error messages, Blocked Process report) and columns (Event Class, Text Data,Login, CPU, Reads, Writes, Duration, SPID, Start/End time, Database/Object/Application name) – both selectable
    • Filters on most data columns
    • Copy all/selected event rows to clipboard in form of XML
    • Find in "Text data" column
    • Export data in Excel’s clipboard format

    While I have found Express Profiler to be a good and useful tool, it is not as fully-featured as the SQL Profiler.  Here are some key "missing" features in Express Profiler:

    • No way to load a saved trace output, although that feature is on the roadmap for the tool.
    • No way to save trace output directly to a database table.
    • Fewer columns can be included in the trace output, and many fewer events can be traced.  In my experience, however, the columns and events that I find myself using in most cases are all available.
    • As there are fewer columns in the output, there are fewer columns on which to filter.  Again, the most common/useful columns and events are covered.
    • No way to create trace templates for use with future traces.

    Despite these limitations, I recommend this tool for situations where the full SQL Profiler is not available.

    How To Restore a Database in a SQL Server AlwaysOn Availability Group

    Edited March 3, 2017 in response to reader comments.  Thanks for pointing out my mistakes!

    Background

    There are two clustered servers running SQL Server 2014.  The servers host production databases as well as databases used for QA and testing.  One AlwaysOn Availability Group has been created for the production databases and one for the QA databases.  The same server serves as the primary for both AlwaysOn Availability Groups.

    One of the production databases needs to be restored from backup.  Backups are taken from the secondary server, not the primary.  The backups should be restored to the same server from which they were taken.

    The following tasks need to be completed in order to restore the database:

    • Make the secondary server from which the backups were taken the primary server
    • Remove the database to be restored from the AlwaysOn Availability Group
    • Restore the database
    • Add the database back into the Always Availability Group

    Following are detailed instructions for completing these tasks.

    Task 1: Switch the Primary and Secondary Servers

    1) Connect to both servers in the cluster in SQL Server Management Studio.

    2) On the Secondary server, expand the “Availability Groups” folder under the “AlwaysOn High Availability” folder.

    01

    3) Right-click on the availability group containing the database to be restored and select “Failover…” from the context menu.  Click “Next >”.

    02

    4) Select the new primary server.  Click “Next >”.

    03

    5) Verify the choices and click “Finish”.

    04

    05

    6) Repeat steps 3-5 for the remaining availability group.

    Task 1: Remove the Database from the Availability Group

    A restore operation cannot be performed on a database that is part of an availability group, so the next task is to remove the database from the group.

    1) On the primary server, expand the list of Availability Databases for the availability group.

    06

    2) Right-click the database to be restored and select “Remove Database from Availability Group…” from the context-menu.

    07

    3) Click “OK” to remove the database from the availability group.

    08

    Task 2: Restore the Database

    1) In the “Databases” folder on the primary server, right-click on the database to be restored and select “Properties” to open the “Database Properties” dialog.  Select the “Options” page, scroll down to the “Restrict Access” option, and change the value from MULTI_USER to SINGLE_USER.  Click “OK”.

    09

    2) In the “Databases” folder on the primary server, right-click on the database to be restored and select Tasks->Restore->Database… from the context menu.

    3) On the “General” page of the “Restore Database” dialog, select the last Full backup and all Transaction log backups.

    10

    4) Select the “Options” page of the “Restore Database” dialog and click the “Overwrite the existing database (WITH REPLACE)” option.  Click “OK”.

    11

    Task 3: Add the Database Back to the Availability Group

    After the restore of the database to the primary server is complete, it can be put back into the availability group.

    1) In the “Database” folder on the secondary server, right-click the database and select “Delete” from the context menu.  Click “OK”.

    12

    2) Right-click “Availability Databases” in the availability group on the primary server and select “Add Database…” from the context menu.  Click “Next >”.

    13

    3) Select the database to be added to the group and click “Next >”.

    14

    4) Select “Full” as the data synchronization preference.  This will take a full backup of the database on the primary and restore it on the secondary server(s).  Specify a network location accessible to the primary and all secondary servers in which to place the backup files.  Click “Next >”.

    15

    5) Use the “Connect…” button to establish a connection to the secondary server(s).  Click “Next >”.

    16

    6) The “Add Database to Availability Group” wizard will validate all of the settings for the new availability group database.  When it completes, click “Next >”.

    17

    7) Verify the choices and click “Finish” to add the database to the availability group.

    18

    19

    Final Tasks

    The restore of a database in an AlwaysOn Availability Group is now complete.

    At this point it is recommended to immediately perform a backup of the restored database on the secondary server.  This will establish a new backup chain for the database.

    The backup files created during synchronization of the primary and secondary server(s) can be deleted.  The location of those files was specified in Step 4 of the “Add the Database Back to the Availability Group” task.

    Note that the restored database should now be back in MULTI_USER mode.  Recall that it had been set to SINGLE_USER in Step 1 of the “Restore the Database” task.

    “Count” and “Count Distinct” Queries in MongoDB

    For the following examples, assume that you have a database that includes four records that include the following fields and values:

    { "_id" : ObjectId("54936…dd0c"), "last_name" : "smith", "first_name" : "mike" }
    { "_id" : ObjectId("54936…dd0d"), "last_name" : "smith", "first_name" : "william" }
    { "_id" : ObjectId("54936…dd0e"), "last_name" : "smith", "first_name" : "william" }
    { "_id" : ObjectId("54936…dd0f"), "last_name" : "smith", "first_name" : "mark" }

    Note that there are four records with a last_name value of “smith”.  The four records have three distinct values for the first_name field (“mike, “william”, and “mark”).

    To count the number of rows returned by a query, use "count()", as shown here:

    > db.collection.find({“last_name”:”smith”}).count();

    4

    To count the unique values, use "distinct()" rather than "find()", and "length" rather than "count()".  The first argument for "distinct" is the field for which to aggregate distinct values, the second is the conditional statement that specifies which rows to select.  Append "length" to the end of the query to count the number of rows returned.  (The "count()" function does not work on the results of a "distinct" query.) 

    Here is an example which counts the distinct number of first_name values for records with a last_name value of “smith”:

    > db.collection.distinct("first_name", {“last_name”:”smith”}).length;

    3

    St. Louis Days of .NET 2014

    My notes from the 2014 edition of St. Louis Days of .NET.  I was only able to attend the first day of the conference this year.

    Front-End Design Patterns: SOLID CSS + JS for Backend Developers

    Presenter: https://twitter.com/anthony_vdh
    Session Materials:  http://vimeo.com/97315940

    Use namespaced, unambiguous classes.   For example, use “.product_list_item” instead of “.product_list li” , and “.h1” instead of “h1”.

    No cascading

    Limit overriding

    CSS Specificity – Specificity is the means by which a browser decides which property values are the most relevant to an element and get to be applied.
        Each CSS rule is assigned a specificity value
        Plot specificity values on a graph where the x-axis represents the line number in the CSS
        Line should be relatively flat, and only trend toward high specificity towards the end of the CSS
        Specificity graph generator: http://jonassebastianohlsson.com/specificity-graph/
        Another option of what a graph should look like: http://snook.ca/archives/html_and_css/specificity-graphs

    Important CSS patterns and concepts
        Namespaces
        Modules
        Prototype
        Revealing Module
        Revealing Prototype

    Optimizing Your Website’s Performance (End-To-End Diagnostics)

    Presenter: http://mitchelsellers.com/
    Session Materials: http://mitchelsellers.com/blogs/2014/11/17/2014-st-louis-days-of-net-presentations.aspx

    If your test environment is different that your production environment, look for linear differences in order to estimate the differences between the servers.  For example, if the production server is a quad-core server and the test server is a dual-core server, measure the performance of the test server twice: once with one core active and once with both cores.  The difference between running with one core vs. two cores should allow you to estimate the difference between the dual-core server and the quad-core server.  Obviously, this will not be perfect, but does provide some baseline for estimating the differences between servers.

    Different browsers have different limits on how many simultaneous requests can be made to a single domain (varies from 4 to 10).

    Simple stuff to look at when optimizing a web site:
        Large images
        Long-running javascript
        Large viewstate

    Make sure cache-expiration is set correctly for static content.  This is done in the web.config file.

    TOOLS

    Google PageSpeed
        Provides mobile and desktop scores
        Used in Google search rankings!
        Not useful for internal sites
        Similar to YSlow
        Blocked by pages requiring a login

    Google Analytics (or similar)
        Useful for investigating daily loads (determine why site is slow at certain times)
        Use to investigate traffic patterns

    Loader.IO
        Reasonably priced and free options available
        Use to simulate traffic load on your site
        Only tests static html

    LoadStorm
        More expensive
        Use to simulate traffic load
        Tests everything; not just static content

    New Relic
        Internal server monitoring

    Hadoop For The SQL Ninja

    Presenter: https://twitter.com/mwinkle

    Hive is a SQL-like query language for Hadoop.
        Originated at Facebook
        Compiles to Map/Reduce jobs
        Queries tables/catalogs defined on top of underlying data stores
        Data stores can be text files, Mongo, etc
        Data stores just need to provide rows and columns of data
        Custom data provides can be created to provide rows/columns of data

    Hive is good for:
        Large scale queries
        A variety of formats
        UDF extensibility

    Hive is NOT good for:
        Interactive querying
        Small tables
        OLTP

    Hive connectivity
        ODBC/JDBC – responsive queries
        Oozie – job-based workflows
        Powershell
        Azure Toolkit/API – now includes Visual Studio integration for viewing/executing queries

    Angular for .NET Developers

    Presenter: https://twitter.com/jamesbender
    Session Materials: https://github.com/JamesBender/AngularDemos

    AngularJS is a Javascript MVC framework
        Model-View-Controller are all on the client
        Data is exchanged via AJAX calls to REST web services
        Makes use of dependency injection

    Benefits of AngularJS
        Unobtrusive Javascript
        Clean HTML
        Limits the need for third party libraries (like jQuery)
        Works well with ASP.NET MVC
        Easy Single-Page Applications (SPA)
        Testing is easy.  Jasmine is the test framework of choice.

    HTML attributes provide AngularJS “hooks”.  For example, notice the attributes on the elements <html ng-app=”AngularApp”> and <input ng-model=”user.name” />

    Data binding example:

        <input ng-model=”user.name”/>
        <p>Hello {{user.name}}</p>

        In this example, data entered into the input text box is echoed in the paragraph below the input element.

    Making Rich, Interactive, Multi-Platform Applications with SignalR

    Presenter: http://mitchelsellers.com/
    Session Materials: http://mitchelsellers.com/blogs/2014/11/17/2014-st-louis-days-of-net-presentations.aspx

    Use cases for SignalR
        Any application that involves polling
        Chat applications
        Real-time score updates
        Voting results
        Real-time stock prices

    The Smooth Transition to TypeScript

    Presenter: https://twitter.com/pottereric

    TypeScript provides compile-time errors in Visual Studio.

    TypeScript has type-checking
        Optional types on variables and parameters
        Primitive types are number, string, boolean, and any
        The “any” type tells the compiler to treat the variable like Javascript would

    Intellisense for TypeScript is very good, and other typical Visual Studio tooling works as well.

    TypeScript files compile to javascript (example.ts –> example.js), and the javascript is what gets referenced in your web applications.

    TypeScript class definitions become javascript types.

    The usual Visual Studio design and compile-time errors are available when working with classes.

    A NuGet package exists that provides “jQuery typing files” that enable working with jQuery in TypeScript.

    TypeScript supports generics and lambdas.

    Removing Duplicates from MongoDB

    IMPORTANT:  The dropDups option was removed starting with MongoDB 3.x, so this solution is only valid for MongoDB versions 2.x and before.  There is no direct replacement for the dropDups option. The answers to the question posed at http://stackoverflow.com/questions/30187688/mongo-3-duplicates-on-unique-index-dropdups offer some possible alternative ways to remove duplicates in Mongo 3.x.

    Duplicate records can be removed from a MongoDB collection by creating a unique index on the collection and specifying the dropDups option.

    Assuming the collection includes a field named record_id that uniquely identifies a record in the collection, the command to use to create a unique index and drop duplicates is:

    db.collection.ensureIndex( { record_id:1 }, { unique:true, dropDups:true } )

    Here is the trace of a session that shows the contents of a collection before and after creating a unique index with dropDups.  Notice that duplicate records are no longer present after the index is created.

    > db.pages.find()
    { “_id” : ObjectId(“52829c886602e2c8428d1d8c”), “leaf_num” : “1”, “scan_id” : “smithsoniancont251985smit”, “height” : 3464, “width” : 2548 }
    { “_id” : ObjectId(“52829c886602e2c8428d1d8d”), “leaf_num” : “1”, “scan_id” : “smithsoniancont251985smit”, “height” : 3464, “width” : 2548 }
    { “_id” : ObjectId(“52829c886602e2c8428d1d8e”), “leaf_num” : “2”, “scan_id” : “smithsoniancont251985smit”, “height” : 3587, “width” : 2503 }
    { “_id” : ObjectId(“52829c886602e2c8428d1d8f”), “leaf_num” : “2”, “scan_id” : “smithsoniancont251985smit”, “height” : 3587, “width” : 2503 }
    >
    > db.pages.ensureIndex( { scan_id:1, leaf_num:1 }, { unique:true, dropDups:true } )
    >
    > db.pages.find()
    { “_id” : ObjectId(“52829c886602e2c8428d1d8c”), “leaf_num” : “1”, “scan_id” : “smithsoniancont251985smit”, “height” : 3464, “width” : 2548 }
    { “_id” : ObjectId(“52829c886602e2c8428d1d8e”), “leaf_num” : “2”, “scan_id” : “smithsoniancont251985smit”, “height” : 3587, “width” : 2503 }
    >

    More information about creating MongoDB indexes, including the use of the dropDups option, can be found here.

    Directing MongoDB Query Results to a File

    The syntax for querying a MongoDB database and directing the results to a file is:

    mongo server/database –eval "query" >> outputfile

    where “server/database” is the address of the MongoDB database, “query” is the MongoDB command to execute, and “outputfile” is the file to which to direct the output.

    Each of the following examples assumes that MongoDB has been installed in the C:\MongoDB folder.

    Running this command…

    C:\>mongodb\bin\mongo localhost/mydb –eval "db.docs.count()" >> counttest.txt

    … will result in a file named counttest.txt that has contents similar to the following:

    MongoDB shell version: 2.4.8
    connecting to: localhost/mydb
    1979179

    For queries that will return JSON objects, be sure to wrap the command in the printjson() function, as in this example:

    C:\>mongodb\bin\mongo localhost/mydb –eval "printjson(db.docs.findOne())" >> selectone.json

    If the command is not wrapped by printjson, the resulting file will contain something like this:

    MongoDB shell version: 2.4.8
    connecting to: localhost/mydb
    [object Object]

    Notice that instead of the contents of the JSON object, the file simply contains the text [object Object].  However, if printjson is used, then the contents of the file will contain the actual JSON object, as shown here:

    MongoDB shell version: 2.4.8
    connecting to: localhost/mydb
    {
         "_id" : ObjectId("528b7c7b594d11167ecdd1a7"),
         "leaf_num" : "0"
    }

    Here is an example that shows the use of a more complicated command.  It performs an aggregation and returns the results as JSON (notice the use of printjson):

    C:\>mongodb\bin\mongo localhost/mydb –eval "printjson(db.docs.group( { key: {scan_id: 1}, reduce: function(cur, result) {result.count += 1 }, initial: {count:0}}))" >> MongoCount.txt

    For this example, the contents of the MongoCount.txt file are:

    MongoDB shell version: 2.4.8
    connecting to: localhost/mydb
    [
         {
              "scan_id" : "01A23374-4D72-4A06-9B88-EF74D0ACEE5D",
              "count" : 151
         },
         {
              "scan_id" : "0245B979-D3D6-4B01-83E1-E8D1D2ADA255",
              "count" : 250

         },
         {
              "scan_id" : "0330266.0001.002.umich.edu",
              "count" : 164
         },
         {
              "scan_id" : "03polybiblionrevue55sociuoft",
              "count" : 290
         }
    ]