Data Access Framework Comparison

Introduction

For some time now I have been working on a project that utilizes a custom-built data access framework, rather than popular ORM frameworks such as Entity Framework or NHibernate.

While the custom framework has worked well for the project, I had questions about it.  For example, it uses stored procedures to implement basic CRUD operations, and I wondered if inline parameterized SQL statements might perform better.  Also, I wondered about the performance of the custom framework compared to the leading ORMs.

Besides my questions about the custom framework, I recognized the importance of having at least a basic understanding of how to use the other ORM frameworks.

In order to answer my questions about the custom framework and to gain some practical experience with the other ORMs, I created a simple web application that uses each of those frameworks to perform basic CRUD applications.  While executing the CRUD operations, the application times them and produces a summary report of the results.

The code for the test application can be found at https://github.com/mlichtenberg/ORMComparison.

NOTE: I assume that most readers are familiar with the basics of Entity Framework and NHibernate, so I will not provide an overview of them here.

Using the custom framework is similar to Entity Framework and NHibernate’s “database-first” approach.  Any project that uses the library references a single assembly containing the base functionality of the library.  A T4 template is used to generate additional classes based on tables in a SQL Server database.  Some of the classes are similar to EF’s Model classes and NHibernate’s Domain classes.  The others provide the basic CRUD functionality for the domain/model classes. 

For these tests I made a second copy of the custom framework classes that provide the basic CRUD functionality, and edited them to replace the CRUD stored procedures with parameterized SQL statements.

The custom framework includes much less overhead on top of ADO.NET than the popular ORMs, so I expected the tests to show that it was the best-performing framework.  The question was, how much better?

In the rest of this post, I will describe the results of my experiment, as well as some of the optimization tips I learned along the way.  Use the following links to jump directly to a topic.

Test Application Overview
“Out-of-the-Box” Performance
Entity Framework Performance After Code Optimization
     AutoDetectChangesEnabled and DetectChanges()
     Recycling the DbContext
NHibernate Performance After Configuration Optimization
     What’s Up with Update Performance in NHibernate?
Results Summary

Test Application Overview

    A SQL Express database was used for the tests.  The data model is borrowed from Microsoft’s Contoso University sample application.  Here is the ER diagram for the database:

image

 

The database was pre-populated with sample data.  The number of rows added to each table were:

Department: 20
Course: 200
Person: 100000
Enrollment: 200000

This was done because SQL Server’s optimizer will behave differently with an empty database than it will with a database containing data, and I wanted the database to respond as it would in a “real-world” situation.  For the tests, all CRUD operations were performed against the Enrollment table.

Five different data access frameworks were tested:

  1. Custom framework with stored procedures
  2. Custom framework with parameterized SQL statements
  3. Entity Framework
  4. NHibernate
  5. Fluent NHibernate

The testing algorithm follows the same pattern for each of the frameworks:

01) Start timer
02) For a user-specified number of iterations 
03)      Submit an INSERT statement to the database
04)      Save the identifier of the new database record
05) End timer
06) Start timer
07) For each new database record identifier
08)      Submit a SELECT statement to the database
09) End timer
10) Start timer
11) For each new database record identifier
12)      Submit an UPDATE statement to the database
13) End timer
14) Start timer
15) For each new database record identifier
16)      Submit a DELETE statement to the database
17) End timer

Note that after the test algorithm completes, the database is in the same state as when the tests began.

To see the actual code, visit https://github.com/mlichtenberg/ORMComparison/blob/master/MVCTestHarness/Controllers/TestController.cs.

"Out-of-the-Box" Performance

I first created very basic tests for each framework. Essentially, these were the “Hello World” versions of the CRUD code for each framework.  No optimization was attempted.

Here is an example of the code that performs the INSERTs for the custom framework.  There is no difference between the version with stored procedures and the version without, other than the namespace from which EnrollmentDAL is instantiated.

    DA.EnrollmentDAL enrollmentDAL = new DA.EnrollmentDAL();

    for (int x = 0; x < Convert.ToInt32(iterations); x++)
    {
        DataObjects.Enrollment enrollment = enrollmentDAL.EnrollmentInsertAuto
            (null, null, 101, 1, null);
        ids.Add(enrollment.EnrollmentID);
    }

      And here is the equivalent code for Entity Framework:

    using (SchoolContext db = new SchoolContext())
    {
       for (int x = 0; x < Convert.ToInt32(iterations); x++)
        {
            Models.Enrollment enrollment = new Models.Enrollment {
                CourseID = 101, StudentID = 1, Grade = null };
            db.Enrollments.Add(enrollment);
            db.SaveChanges();
            ids.Add(enrollment.EnrollmentID);
        }

    }

    The code for NHibernate and Fluent NHibernate is almost identical.  Here is the NHibernate version:

using (var session = NH.NhibernateSession.OpenSession("SchoolContext"))
{
    var course = session.Get<NHDomain.Course>(101);
    var student = session.Get<NHDomain.Person>(1);

    for (int x = 0; x < Convert.ToInt32(iterations); x++)
    {
        var enrollment = new NHDomain.Enrollment { 
            Course = course, Person = student, Grade = null };
        session.SaveOrUpdate(enrollment);

        ids.Add(enrollment.Enrollmentid);
    }

}

The SELECT, UPDATE, and DELETE code for each framework followed similar patterns. 

    NOTE: A SQL Server Profiler trace proved that the actual interactions with the database were the same for each framework.  The same database connections were established, and equivalent CRUD statements were submitted by each framework.  Therefore, any measured differences in performance are due to the overhead of the frameworks themselves.

        Here are the results of the tests of the “out-of-the-box” code:

      Framework              Operation     Elapsed Time (seconds)
      Custom                 Insert        5.9526039
      Custom                 Select        1.9980745
      Custom                 Update        5.0850357
      Custom                 Delete        3.7785886

      Custom (no SPs)        Insert        5.2251725
      Custom (no SPs)        Select        2.0028176
      Custom (no SPs)        Update        4.5381994
      Custom (no SPs)        Delete        3.7064278

      Entity Framework       Insert        1029.5544975
      Entity Framework       Select        8.6153572
      Entity Framework       Update        2362.7183765
      Entity Framework       Delete        25.6118191

      NHibernate             Insert        9.9498188
      NHibernate             Select        7.3306331
      NHibernate             Update        274.7429862
      NHibernate             Delete        12.4241886

      Fluent NHibernate      Insert        11.796126
      Fluent NHibernate      Select        7.3961941
      Fluent NHibernate      Update        283.1575124
      Fluent NHibernate      Delete        10.791648

      NOTE: For all tests, each combination of Framework and Operation was executed 10000 times.   Looking at the first line of the preceding results, this means that Custom framework took 7.45 seconds to perform 10000 INSERTs.

      As you can see, both instances of the the custom framework outperformed Entity Framework and NHibernate.  In addition, the version of the custom framework that used parameterized SQL was very slightly faster than the version that used stored procedures.  Most interesting however, was the performance for INSERT and UPDATE operations.  Entity Framework and both versions of NHibernate were not just worse than the two custom framework versions, they were much MUCH worse.  Clearly, some optimization and/or configuration changes were needed.

      Entity Framework Performance After Code Optimization

      AutoDetectChangesEnabled and DetectChanges()  

      It turns out that much of Entity Framework’s poor performance appears to have been due to the nature of the tests themselves.  Information on Microsoft’s MSDN website notes that if you are tracking a lot of objects in your DbContext object and call methods like Add() and SaveChanges() many times in a loop, your performance may suffer.  That scenario describes the test almost perfectly.

      The solution is to turn off Entity Framework’s automatic detection of changes by setting AutoDetectChangesEnabled to false and explicitly calling DetectChanges().  This instructs Entity Framework to only detect changes to entities when explicitly instructed to do so.  Here is what the updated code for performing INSERTs with Entity Framework looks like (changes highlighted in red):

      using (SchoolContext db = new SchoolContext())
      {
          db.Configuration.AutoDetectChangesEnabled = false;

          for (int x = 0; x < Convert.ToInt32(iterations); x++)
          {
              Models.Enrollment enrollment = new Models.Enrollment {
                  CourseID = 101, StudentID = 1, Grade = null };
              db.Enrollments.Add(enrollment);
              db.ChangeTracker.DetectChanges();
              db.SaveChanges();
              ids.Add(enrollment.EnrollmentID);
          }
      }

      Here are the results of tests with AutoDetectChangesEnabled set to false:

      Framework           Operation    Elapsed Time (seconds)
      Entity Framework    Insert       606.5569332
      Entity Framework    Select       6.4425741
      Entity Framework    Update       605.6206616
      Entity Framework    Delete       21.0813293

      As you can see, INSERT and UPDATE performance improved significantly, and SELECT and DELETE performance also improved slightly.

      Note that turning off AutoDetectChangesEnabled and calling DetectChanges() explicitly in all cases WILL slightly improve the performance of Entity Framework.  However, it could also cause subtle bugs.  Therefore, it is best to only use this optimization technique in very specific scenarios and allow the default behavior otherwise.

      Recycling the DbContext

      While Entity Framework performance certainly improved by changing the AutoDetectChangesEnabled value, it was still relatively poor. 

      Another problem with the tests is that the same DbContext was used for every iteration of an operation (i.e. one DbContext object was used for all 10000 INSERT operations).  This is a problem because the context maintains a record of all entities added to it during its lifetime.  The effect of this was a gradual slowdown of the INSERT (and UPDATE) operations as more and more entities were added to the context.

      Here is what the Entity Framework INSERT code looks like after modifying it to periodically create a new Context (changes highlighted in red):

      for (int x = 0; x < Convert.ToInt32(iterations); x++)
      {
          // Use a new context after every 100 Insert operations
          using (SchoolContext db = new SchoolContext())
          {
              db.Configuration.AutoDetectChangesEnabled = false;

              int count = 1;
              for (int y = x; y < Convert.ToInt32(iterations); y++)
              {
                  Models.Enrollment enrollment = new Models.Enrollment {
                      CourseID = 101, StudentID = 1, Grade = null };
                  db.Enrollments.Add(enrollment);
                  db.ChangeTracker.DetectChanges();
                  db.SaveChanges();
                  ids.Add(enrollment.EnrollmentID);

                  count++;
                  if (count >= 100) break;
                  x++;
              }
          }
      }

      And here are the results of the Entity Framework tests with the additional optimization added:

      Framework            Operation     Elapsed Time (seconds)
      Entity Framework     Insert        14.7847024
      Entity Framework     Select        5.5516514
      Entity Framework     Update        13.823694
      Entity Framework     Delete        10.0770142

      Much better!  The time to perform the SELECT operations was little changed, but the DELETE time was reduced by half, and the INSERT and UPDATE times decreased from a little more than 10 minutes to about 14 seconds.

      NHibernate Performance After Configuration Optimization

      For the NHibernate frameworks, the tests themselves were not the problem.  NHibernate itself needs some tuning. 

      An optimized solution was achieved by changing the configuration settings of the NHibernate Session object.  Here is the definition of the SessionFactory for NHibernate (additions highlighted in red):

      private static ISessionFactory SessionFactory
      {
          get
          {
              if (_sessionFactory == null)
              {
                  string connectionString = ConfigurationManager.ConnectionStrings
                      [_connectionKeyName].ToString();

                  var configuration = new NHConfig.Configuration();
                  configuration.Configure();

                  configuration.SetProperty(NHConfig.Environment.ConnectionString,
                      connectionString);

                  configuration.SetProperty(NHibernate.Cfg.Environment.FormatSql,
                      Boolean.FalseString);
                  configuration.SetProperty
                     (NHibernate.Cfg.Environment.GenerateStatistics,
                          Boolean.FalseString);
                  configuration.SetProperty
                     (NHibernate.Cfg.Environment.Hbm2ddlKeyWords,
                          NHConfig.Hbm2DDLKeyWords.None.ToString());
                  configuration.SetProperty(NHibernate.Cfg.Environment.PrepareSql,
                          Boolean.TrueString);
                  configuration.SetProperty
                      (NHibernate.Cfg.Environment.PropertyBytecodeProvider,
                          "lcg");
                  configuration.SetProperty
                      (NHibernate.Cfg.Environment.PropertyUseReflectionOptimizer,
                          Boolean.TrueString);
                  configuration.SetProperty
                      (NHibernate.Cfg.Environment.QueryStartupChecking,
                          Boolean.FalseString);
                  configuration.SetProperty(NHibernate.Cfg.Environment.ShowSql, 
                      Boolean.FalseString);
                  configuration.SetProperty
                      (NHibernate.Cfg.Environment.UseProxyValidator, 
                          Boolean.FalseString);
                  configuration.SetProperty
                      (NHibernate.Cfg.Environment.UseSecondLevelCache,
                          Boolean.FalseString);

                  configuration.AddAssembly(typeof(Enrollment).Assembly);
                  _sessionFactory = configuration.BuildSessionFactory();
              }
              return _sessionFactory;
          }
      }

      And here is the InitializeSessionFactory method for Fluent NHibernate, with the equivalent changes included:

      private static void InitializeSessionFactory()
      {
          string connectionString = ConfigurationManager.ConnectionStrings[_connectionKeyName]
              .ToString();

          _sessionFactory = Fluently.Configure()
              .Database(MsSqlConfiguration.MsSql2012.ConnectionString(connectionString).ShowSql())
              .Mappings(m => m.FluentMappings.AddFromAssemblyOf<Enrollment>())
              .BuildConfiguration().SetProperty
                  (NHibernate.Cfg.Environment.FormatSql, Boolean.FalseString)
              .SetProperty(NHibernate.Cfg.Environment.GenerateStatistics,
                  Boolean.FalseString)
              .SetProperty(NHibernate.Cfg.Environment.Hbm2ddlKeyWords,
                  NHibernate.Cfg.Hbm2DDLKeyWords.None.ToString())
              .SetProperty(NHibernate.Cfg.Environment.PrepareSql,
                  Boolean.TrueString)
              .SetProperty(NHibernate.Cfg.Environment.PropertyBytecodeProvider,
                  "lcg")
              .SetProperty
                  (NHibernate.Cfg.Environment.PropertyUseReflectionOptimizer,
                      Boolean.TrueString)
              .SetProperty(NHibernate.Cfg.Environment.QueryStartupChecking,
                  Boolean.FalseString)
              .SetProperty(NHibernate.Cfg.Environment.ShowSql, Boolean.FalseString)
              .SetProperty(NHibernate.Cfg.Environment.UseProxyValidator,
                  Boolean.FalseString)
              .SetProperty(NHibernate.Cfg.Environment.UseSecondLevelCache,
                  Boolean.FalseString)
              .BuildSessionFactory();
      }

      The following table gives a brief description of the purpose of these settings:

      Setting                   Purpose
      FormatSql                 Format the SQL before sending it to the database
      GenerateStatistics        Produce statistics on the operations performed
      Hbm2ddlKeyWords           Should NHibernate automatically quote all db object names
      PrepareSql                Compiles the SQL before executing it
      PropertyBytecodeProvider  What bytecode provider to use for the generation of code
      QueryStartupChecking      Check all named queries present in the startup configuration
      ShowSql                   Show the produced SQL
      UseProxyValidator         Validate that mapped entities can be used as proxies
      UseSecondLevelCache       Enable the second level cache

      Notice that several of these (FormatSQL, GenerateStatistics, ShowSQL) are most useful for debugging.  It is not clear why they are enabled by default in NHibernate; it seems to me that these should be opt-in settings, rather than opt-out.

      Here are the results of tests of the NHibernate frameworks with these changes in place:

      Framework                        Operation     Elapsed Time (seconds)
      NHibernate (Optimized)           Insert        5.0894047
      NHibernate (Optimized)           Select        5.2877312
      NHibernate (Optimized)           Update        133.9417387
      NHibernate (Optimized)           Delete        5.6669841

      Fluent NHibernate (Optimized)    Insert        5.0175024
      Fluent NHibernate (Optimized)    Select        5.2698945
      Fluent NHibernate (Optimized)    Update        128.3563561
      Fluent NHibernate (Optimized)    Delete        5.5299521

      These results are much improved, with the INSERT, SELECT, and DELETE operations nearly matching the results achieved by the custom framework.   The UPDATE performance, while improved, is still relatively poor.

      What’s Up with Update Performance in NHibernate?

      The poor update performance is a mystery to me.  I have researched NHibernate optimization techniques and configuration settings, and have searched for other people reporting problems with UPDATE operations.  Unfortunately, I have not been able to find a solution.

      This is disappointing, as I personally found NHibernate more comfortable to work with than Entity Framework, and because it beats or matches the performance of Entity Framework for SELECT, INSERT, and DELETE operations.

      If anyone out there knows of a solution, please leave a comment!

      Final Results

      The following table summarizes the results of the tests using the optimal configuration for each framework.  These are the same results shown earlier in this post, combined here in a single table.

      Framework                        Operation     Elapsed Time (seconds)
      Custom                           Insert        5.9526039
      Custom                           Select        1.9980745
      Custom                           Update        5.0850357
      Custom                           Delete        3.7785886

      Custom (no SPs)                  Insert        5.2251725
      Custom (no SPs)                  Select        2.0028176
      Custom (no SPs)                  Update        4.5381994
      Custom (no SPs)                  Delete        3.7064278

      Entity Framework (Optimized)     Insert        14.7847024
      Entity Framework (Optimized)     Select        5.5516514
      Entity Framework (Optimized)     Update        13.823694
      Entity Framework (Optimized)     Delete        10.0770142

      NHibernate (Optimized)           Insert        5.0894047
      NHibernate (Optimized)           Select        5.2877312
      NHibernate (Optimized)           Update        133.9417387
      NHibernate (Optimized)           Delete        5.6669841

      Fluent NHibernate (Optimized)    Insert        5.0175024
      Fluent NHibernate (Optimized)    Select        5.2698945
      Fluent NHibernate (Optimized)    Update        128.3563561
      Fluent NHibernate (Optimized)    Delete        5.5299521

      And here is a graph showing the same information:

      image

    Advertisements

    hOCRImageMapper: A Tool For Visualizing hOCR Files

    Just uploaded to GitHub (https://github.com/mlichtenberg/hocrimagemapper), this simple application provides a way to visualize hOCR output.

    Per Wikipedia: "hOCR is an open standard of data representation for formatted text obtained from optical character recognition (OCR). The definition encodes text, style, layout information, recognition confidence metrics and other information using Extensible Markup Language (XML) in form of Hypertext Markup Language (HTML) or XHTML."

    hOCR is produced by the Tesseract, Cuneiform, and OCRopus OCR software.  My motivation for creating this tool was a need to analyze hOCR output produced by Tesseract.

    This application has been implemented as a simple WinForms application  (yeah, I know, but it was quick) written in C#.

    When using the application, the text contained in an hOCR file is loaded alongside the image that is the source of the OCR output.  Hovering over a word in the text highlights the word in the image. 

    image
    Hovering over the word “quantitative” in the left panel highlights the word in the source image on the right.

    Clicking a word in the text displays the coordinates for the bounding box used to highlight the word.  (This bounding box is extracted from the hOCR output).  The coordinates are displayed as two pairs of X-Y coordinates that represent the upper right and lower left corners of the bounding box.

    image
    Clicking the word displays its coordinates.  In
    this case, the X-Y pairs are (513, 540) for the
    upper right and (846, 600) for the lower left.

    The source code can be downloaded from the Github repository, or the compiled executable can be downloaded directly.

    St. Louis Days of .NET 2014

    My notes from the 2014 edition of St. Louis Days of .NET.  I was only able to attend the first day of the conference this year.

    Front-End Design Patterns: SOLID CSS + JS for Backend Developers

    Presenter: https://twitter.com/anthony_vdh
    Session Materials:  http://vimeo.com/97315940

    Use namespaced, unambiguous classes.   For example, use “.product_list_item” instead of “.product_list li” , and “.h1” instead of “h1”.

    No cascading

    Limit overriding

    CSS Specificity – Specificity is the means by which a browser decides which property values are the most relevant to an element and get to be applied.
        Each CSS rule is assigned a specificity value
        Plot specificity values on a graph where the x-axis represents the line number in the CSS
        Line should be relatively flat, and only trend toward high specificity towards the end of the CSS
        Specificity graph generator: http://jonassebastianohlsson.com/specificity-graph/
        Another option of what a graph should look like: http://snook.ca/archives/html_and_css/specificity-graphs

    Important CSS patterns and concepts
        Namespaces
        Modules
        Prototype
        Revealing Module
        Revealing Prototype

    Optimizing Your Website’s Performance (End-To-End Diagnostics)

    Presenter: http://mitchelsellers.com/
    Session Materials: http://mitchelsellers.com/blogs/2014/11/17/2014-st-louis-days-of-net-presentations.aspx

    If your test environment is different that your production environment, look for linear differences in order to estimate the differences between the servers.  For example, if the production server is a quad-core server and the test server is a dual-core server, measure the performance of the test server twice: once with one core active and once with both cores.  The difference between running with one core vs. two cores should allow you to estimate the difference between the dual-core server and the quad-core server.  Obviously, this will not be perfect, but does provide some baseline for estimating the differences between servers.

    Different browsers have different limits on how many simultaneous requests can be made to a single domain (varies from 4 to 10).

    Simple stuff to look at when optimizing a web site:
        Large images
        Long-running javascript
        Large viewstate

    Make sure cache-expiration is set correctly for static content.  This is done in the web.config file.

    TOOLS

    Google PageSpeed
        Provides mobile and desktop scores
        Used in Google search rankings!
        Not useful for internal sites
        Similar to YSlow
        Blocked by pages requiring a login

    Google Analytics (or similar)
        Useful for investigating daily loads (determine why site is slow at certain times)
        Use to investigate traffic patterns

    Loader.IO
        Reasonably priced and free options available
        Use to simulate traffic load on your site
        Only tests static html

    LoadStorm
        More expensive
        Use to simulate traffic load
        Tests everything; not just static content

    New Relic
        Internal server monitoring

    Hadoop For The SQL Ninja

    Presenter: https://twitter.com/mwinkle

    Hive is a SQL-like query language for Hadoop.
        Originated at Facebook
        Compiles to Map/Reduce jobs
        Queries tables/catalogs defined on top of underlying data stores
        Data stores can be text files, Mongo, etc
        Data stores just need to provide rows and columns of data
        Custom data provides can be created to provide rows/columns of data

    Hive is good for:
        Large scale queries
        A variety of formats
        UDF extensibility

    Hive is NOT good for:
        Interactive querying
        Small tables
        OLTP

    Hive connectivity
        ODBC/JDBC – responsive queries
        Oozie – job-based workflows
        Powershell
        Azure Toolkit/API – now includes Visual Studio integration for viewing/executing queries

    Angular for .NET Developers

    Presenter: https://twitter.com/jamesbender
    Session Materials: https://github.com/JamesBender/AngularDemos

    AngularJS is a Javascript MVC framework
        Model-View-Controller are all on the client
        Data is exchanged via AJAX calls to REST web services
        Makes use of dependency injection

    Benefits of AngularJS
        Unobtrusive Javascript
        Clean HTML
        Limits the need for third party libraries (like jQuery)
        Works well with ASP.NET MVC
        Easy Single-Page Applications (SPA)
        Testing is easy.  Jasmine is the test framework of choice.

    HTML attributes provide AngularJS “hooks”.  For example, notice the attributes on the elements <html ng-app=”AngularApp”> and <input ng-model=”user.name” />

    Data binding example:

        <input ng-model=”user.name”/>
        <p>Hello {{user.name}}</p>

        In this example, data entered into the input text box is echoed in the paragraph below the input element.

    Making Rich, Interactive, Multi-Platform Applications with SignalR

    Presenter: http://mitchelsellers.com/
    Session Materials: http://mitchelsellers.com/blogs/2014/11/17/2014-st-louis-days-of-net-presentations.aspx

    Use cases for SignalR
        Any application that involves polling
        Chat applications
        Real-time score updates
        Voting results
        Real-time stock prices

    The Smooth Transition to TypeScript

    Presenter: https://twitter.com/pottereric

    TypeScript provides compile-time errors in Visual Studio.

    TypeScript has type-checking
        Optional types on variables and parameters
        Primitive types are number, string, boolean, and any
        The “any” type tells the compiler to treat the variable like Javascript would

    Intellisense for TypeScript is very good, and other typical Visual Studio tooling works as well.

    TypeScript files compile to javascript (example.ts –> example.js), and the javascript is what gets referenced in your web applications.

    TypeScript class definitions become javascript types.

    The usual Visual Studio design and compile-time errors are available when working with classes.

    A NuGet package exists that provides “jQuery typing files” that enable working with jQuery in TypeScript.

    TypeScript supports generics and lambdas.

    St. Louis Day of .NET 2013

    This post is long overdue, as the 2013 Day of .NET took place almost two months ago.  I set aside my notes while I waited for presenters to post their session materials online… and then I forgot about it.  So, without further ado, here are my notes from the event:

    DAY 1

    Session: Entity Framework in the Enterprise

    Presenter: http://www.twitter.com/scottkuhl
    Session Materials: https://skydrive.live.com/?cid=b11213b176cf5d39&id=B11213B176CF5D39%2148854

    Getting Started with Entity Framework
         http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc (EF6 and MVC5)
         http://www.asp.net/mvc/tutorials/getting-started-with-ef-5-using-mvc-4
    (EF5 and MVC4)

    SQL Server Data Tools 
         Use LocalDB 
         Allows for loading of test data 
         Allows for data to be "reset" to a known state 
         Remember to check the "Target Connection String" in the DB project properties dialog

    Entity Framework Power Tools v.4 (Beta)
         http://msdn.microsoft.com/en-us/data/jj593170.aspx
         Provides reverse engineering of databases into code-first classes, using the Fluent API

    Unit Testing
         Entity Framework 6 has support for mocking frameworks
         Allows you to create your own test doubles
         It is recommended to test against a "real" DB for Last Mile test and performance tests

    Audit Tracking
         SQL Server Change Data Capture
              Available in SQL Server 2008 and beyond (Enterprise Editions only)
              Uses change tables that mirror structure of tables being tracked
              Populates the change tables by analyzing the transaction log (not via triggers)
         If using EF natively
              Override the "SaveChanges" methods
              Loop through the contents of the "ChangeTracker" collection (saving the details along the way)

    Performance Tracking
         Entity Framework 6 includes/allows logging of SQL statements and execution times
         Other useful tools include NLog and Glimpse

    Session:  Introduction to MongoDB

    Presenter: http://www.twitter.com/bradurani

    Background
         6th most popular database in the world, just behind PostgreSQL and DB2
         There are drivers for many languages, as well as a LINQ provider.
         Data stored as BSON (binary JSON)
         Everything is case-sensitive

    Benefits
         Speed – basic queries are much faster than SQL DBs
         Rich Dynamic Queries – not as limited as other NoSQL DBs
         Easy Replication and Failover
         Scalability
         Automatic Sharding

    Limitations
         No transactions
         No joins
         RAM intensive
         No referential integrity
         "Eventual consistency" – periods of inconsistency usually measured in milliseconds

    UIs
         MongoDB shell (command line)
         Various GUI tools

    Querying
         Can query by regular expression
         Can return entire records or specific fields

    Object IDs
         Object IDs (auto-generated unique IDs) contain timestamp of record creation.
         Timestamps contained in Object IDs can be retrieved.
         Can define your own IDs, which is useful for sharding

    Indexing
         Can index pretty much any part (or parts) of a record, up to and including the entire record

    Replication
         If the primary fails, a secondary is auto-elected as the new primary

    Session:  Modern Web Diagnostics with A Glimpse into ASP.NET

    Presenter: http://www.twitter.com/anthony_vdh

    Background
         Installed via NuGet
         New versions are released approximately every two weeks

    Functionality
         Gives insight into ASP.NET, WebForms, and others
         Gives diagnostics on networks, databases, page lifecycle, viewstate, and more
         Can trace individual users
         Can be enabled/disabled in various ways (cookies, roles, etc)
         Keeps the history of the last 50 requests, so recent requests can be examined after they occur

    Platform Support
         Cross browser (last versions of browers supported) and cross platform
         Support exists for tracing NHibernate, Entity Framework, MVC, WebForms
         WebAPI support is on the way (not there now)

    Session:  Parallelism in .NET

    Presenter: http://www.twitter.com/mulbud
    Session Materials: http://geekswithblogs.net/NewThingsILearned/archive/2013/11/21/st.-louis-day-of-.net-2013.aspx

    Threads
         More threads means more memory usage and more context switching
         Developers need to find the appropriate balance between the # of threads and resource usage
         Available since .NET 1.0

    ThreadPool
         Similar to database connection pooling
         Resources are managed much better
         Available since .NET 1.0

    Parallel Linq (PLINQ)
         Example: from r in object.AsParallel() select r
         When using this, you must watch out for shared resources, and lock them correctly

    Parallel Library
         Provides the parallel For, ForEach, and Invoke methods
         Allows processing to be stopped via the "ParellelLoopState" delegate

    Tasks (TPL => Task Parallel Library)
         The most complex option to use, but also the most flexible
         Can be used "as needed"; they are not bound to the loop processing of the Parallel Library
         Allow parallel processes to be stopped
         Necessary for the use of Await/Async

    Await/Async
         See the slide deck for the details of how "await" works
         Async methods must return Task
         Await can always be used on a Task, whether it is "async" or not
         When calling an async method, always await it (best practice)

    Debugging Support
         When a breakpoint is hit, all running tasks stop
         Several parallel debugging windows are available under the Visual Studio "Debug" menu
              Tasks – shows all running tasks; click a task to go to the currently executing statement
              Parallel Stacks – visual display of running tasks and the call stack; click a task to see the current statement
              Parallel Watch – allows watching a variable in a particular task

    Session:  A Deeper Dive Into Xamarin.Android

    Presenter: http://www.twitter.com/benjamminstl or http://www.twitter.com/rendrstl
    Session Materials: http://www.slideshare.net/BenjaminBishop1/deep-dive-xamarinandroid

    Tools
         Xamarin Studio (native) – not free
         Xamarin Studio plug-in for Visual Studio – not free
         PhoneGap

    Recommended components for easing cross-platform development:
         Xamarin.Mobile – abstracts your code for location/photos/contacts across all platforms
         Xamarin.Social – similar to the Mobile component, only for social services
         Xamarin.Auth – makes OAuth easier to use
         RestSharp
         SQLite.NET
         JSON.NET

    Components for Android
         Google Play Services
         Backward compatibility component (for supporting older versions of Android)

    Frameworks
         MvvmCross
         MonkeyArms

    Notes about developing for Android
         Turn on Hardware Acceleration in the application manifest
         Activity (app) lifecycle events reminiscent of ASP.NET page lifecycle events (or Windows 8 app events)
         Lots of XML involved in app creation
         "Layouts" are used to create app UIs.  Reminiscent of XAML.

    Experience
         Android SDK is more robust and complicated than iOS
         Not as prescriptive in UI/design
         Device fragmentation is a challenge
         Emulators are poor; use a real device for testing
         Platform is more innovative than iOS, but not as polished

    DAY 2

    Session:  All You Ever Wanted to Know About Hadoop

    Presenter: Matt Winkler

    Written in Java (runs on the JVM)

    Installation Options
         Single computer
              HDInsight (Microsoft’s implementation) can be installed from Web Platform Installer
         Cluster
              Various installation packages
         Cloud
              Azure – multiple nodes running HDInsight can be easily provisioned
              Amazon Cloud Services

    MapReduce is the tool for querying data with Hadoop
         White Paper: Data-Intensive Text Processing with MapReduce (http://lintool.github.io/MapReduceAlgorithms/)
         MapReduce can be thought of as the assembly language for Hadoop.

    Extensions to MapReduce
         Most of these compile down to MapReduce packages

         Hive – SQL-like query language
         Pig – another query platform
         SCALDING – Scala-like query language.  The syntax is LINQ-like.
        
    Other Tools
         SQOOP – used for loading traditional RDBMS data into Hadoop
         STORM – tool for complex event processing
         OOZIE – Workflow Management for Hadoop

    Session:  Building A REST API With Node.js and MongoDB

    Presenter: http://www.twitter.com/leebrandt
    Session Materials: https://github.com/leebrandt/ReSTNodeMongo

    Useful Node.js packages (similar to NuGet packages in .NET)
         Restify – adds REST capabilities
         Underscore
         Toaster – UI functionality
         Moment – date handling
         MongoDB – MongoDB client tools

    WebStorm from JetBrains is a recommended Javascript editor ($49 individual developer license)

    http://education.mongodb.com – offers *free* online course on MongoDB

    Session:  Starting with Code-First Entity Framework

    Presenter:  http://www.twitter.com/mulbud
    Session materials: http://geekswithblogs.net/NewThingsILearned/archive/2013/11/21/st.-louis-day-of-.net-2013.aspx

    Create a class that inherits from DbContext… within that class, define the tables to create

    Create classes to represent each table

    Database is created automatically the first time that it is accessed

    Handing DB changes
         1) Update the code
         2) Via attributes, databases can be set to drop/create always, drop/create only when the model changes
         3) Database migrations are another option

    Database Migrations
         Package Manager Console can be used to generate classes to handle migrations. 
         Alternately, create a Configuration class in a Migrations folder
         Use the Configuration class with the MigrateDatabaseToLatestVersion class in SetInititalizer method of the Database object.
         Or, if you choose not to trust the auto-migration, generate a TSQL script to perform the migration.
         TSQL scripts can be generated from the Package Manager Console

    ExpressProfiler is a simple SQL profiler… find it on CodePlex.

    Session:  Introduction to Knockout.js

    Presenter:  http://www.twitter.com/johnvpetersen
    Session Materials: https://skydrive.live.com/?cid=4b5cb012cf825f0c&id=4B5CB012CF825F0C%213211&authkey=!APKGSzHxmC91400

    What is it?

         JS library for dynamic web-based UI’s
         Applies MVVM to automate data binding

    Concepts

         Declarative bindings
         Dependency Tracking
         Templating
         Automatic UI Refresh
         Dependency Injection

    MVVM (Model-View-ViewModel) Pattern

         Combination of the MVC/MVP patterns
         View – UI and UI Logic, talks with ViewModel and receives notifications from ViewModel
         ViewModel – Presentation Logic, talks with View (data binding and commands (bi-directional), notifications [to View]) and Model (bi-directional)
         Model – Business Logic and Data, talks with ViewModel

    Data Binding

    Data

         Knockout.js implements the ViewModel

    var myViewModel = function() {

          var data = { productid: 1, productname: "shoe", productprice=1.99 };

              this.property = ko.observable("value");   

              this.products = ko.observableArray(data);     // "data" is an array of products

              this.handler = function(data,event) {}

              }

         ko.applyBindings(new myViewModel());

         "ko" is the global identifier for Knockout

    Binding

    Attributes of HTML elements are bound to the ViewModel properties (also CSS and conditional logic like "foreach" and "if")

         <input data-bind="value: property" />

         <button data-bind="click: handler"></button>

         <tbody data-bind="foreach: products">

              <tr><td><input data-bind="value: productid"></td></tr>

         </tbody>

    Individual elements can be bound to more than one property (example: "text" bound to one thing, "visible" bound to another)

    Session:  Real World Azure – How We Use Azure at Swank HealthCare

    Presenter: Brad Tutterow

    SQL in an Azure VM vs SQL Azure Database
         VM option does place your database in the cloud
              BUT
         VM option still requires you do to your own backups/restores/server maintenance
         VM option does not provide for scalability of a "true" cloud DB

         SQL Azure DB is Microsoft’s preference

    DB Changes that were needed for SQL Azure Database
         Remove Cross-DB triggers
         Remove file groups in CREATE scripts
         Account for cloud-based SQL being a limited subset of full SQL
              Example: No "USE" statement, so scripts may need update
         Modify backup strategy (no traditional Backup/Restore in the cloud)

    Always run two of every Web Role
         Roles are frequently recycled by Azure
         If only one Role exists, your site is down when the Role is recycled
         If two Roles exist, Azure will switch between the two as needed, and not recycle both at the same time

    Deployment best practices
         Determine which application settings (web.config) need to be changed at runtime
              Move those settings to Azure settings
              Everything else can stay in the web.config
              Changing web.config in production doesn’t "stick"… Role recycle will wipe the changes
         Create deployment packages
              Role recycle will wipe updates if not deployed via a package
         Make no assumptions about what is available on the server
              You must deploy everything your app needs (all NuGet packages, etc)
              Role recycle produces fresh copy of Windows

    Database updates handled via EF Code-First Database Migrations
         Question: How would updates be handled without Code-First, or with some other ORM?

    Pain points
         Local Azure emulator is unreliable and inconsistent
         No effective way to do QA on-premise (means more cost for a QA environment in Azure)
         Learning curve (not too bad)
         Azure SDK versioning (keeping everything in sync… updates are quarterly)
         EF migrations and Azure SQL (scripts don’t always work in Azure; need to be edited)

    Good things
         Uptime and reliability
         Buy-in from sales/operations/infrastructure
         Enforced best practices for design and deployment
         Pristine/clean production environments
         QA/Prod environments are identical
         No IIS or Windows OS management
         Easy deployments

    TO-DO LIST

    • Investigate SQL Server Change Data Capture as a replacement for auditing with triggers.
    • http://virtualboxes.org/images/
    • Check out DurandalJS (mentioned in several sessions)
    • Check out Twitter Bootstrap
    • Check out LESS
    • Check out Glimpse
    • Think about what could be done with a large OCR corpus and Hadoop 

    Parsing Delimited Text Files with LINQ

    A simple LINQ query can be used to parse delimited text files into a list of objects.

    Consider a tab-delimited file named Data.txt that contains contact information.  Specifically,it contains Names, Phone Numbers, Birth Dates, and Email Addresses, like this:

    Joe Smith    111-222-3333     1/1/1980      joe.smith@zzzz.com
    John Doe     444-555-6666     7/31/1970     john.doe@zzzz.com
    Jane Doe     666-777-8888     4/25/1975     jane.doe@zzzz.com

    Assume that the following class exists:

    class Contact
    {
        public string Name { get; set; }
        public string Phone { get; set; }
        public string BirthDate { get; set; }
        public string Email { get; set; }
    }

    This LINQ query will produce a list of Contact objects that are populated with the information in the text file:

    var contacts = from line in System.IO.File.ReadAllLines(@"Data.txt")
                   let parts = line.Split(‘\t’)
                   select new Contact
                   {
                       Name = parts[0],
                       Phone = parts[1],
                       BirthDate = parts[2],
                       Email = parts[3]
                   };

    A Simple Parallel.ForEach Example

    Previously, I wrote about C# implementations of algorithms used to compare a document to a dictionary of documents.  You can see the original article here, and download the source code here.

    The main processing loops are similar for each algorithm.  Simply put, strings are compared by looping through each entry in the dictionary and applying a comparison algorithm to the target string and the current dictionary entry. 

    Some of these algorithms are slow, particularly with large dictionaries.  In each case, the actual comparison part of the loop is the most expensive operation, and it is performed one-dictionary-entry at a time.  This loop is a prime candidate for parallelization.  Let’s see how this is accomplished with the use of a simple Parallel.ForEach statement.

    Here is an example of the main loop for the string comparison algorithms: 

    foreach (KeyValuePair<string, string> kvp in Dictionary)
    {
        string dictionaryItem = kvp.Value;

          // Compute the Levenstein distance between the target document and the current dictionary item

        int score = lv.GetDistance(document, dictionaryItem);

        // Save the result
        ResolutionResult result = new ResolutionResult();
        result.Key = kvp.Key;
        result.Document = kvp.Value;
        result.Score = score;
        results.Add(result);
    }
     

    The GetDistance method compares the strings using the Levenshtein Distance algorithm and produces a score that represents how similar the strings are.  That score is then stored in a generic list of ResolutionResult objects.

    And here is the same loop, implemented to allow iterations over the loop to run in parallel:

    object _lock = new object();
    Parallel.ForEach(Dictionary, kvp =>
        {
            string dictionaryItem = kvp.Value;

              // Compute the Levenstein distance between the target doc and the current dictionary item

            int score = lv.GetDistance(document, dictionaryItem);

              // Save the result

            ResolutionResult result = new ResolutionResult();
            result.Key = kvp.Key;
            result.Document = kvp.Value;
            result.Score = score;
            lock (_lock) { results.Add(result); }
        });

    The foreach statement is replaced with Parallel.ForEach.  Note that in order to use Parallel.ForEach, a “using” statement for System.Threading.Tasks is required (not shown in the example). 

    Other than that, the body of the loop is the same, with one exception.  The results.Add(result) statement is now wrapped in a lock(object) { } statement.  This is because the results list is updated by all iterations of the loop, which now run in parallel.  If the list is not locked during the Add operation, multiple iterations of the loop may try to update the list at the same time, resulting in conflicts and errors.

    Tests on a dual core processor show that the loop that uses the Parallel.ForEach operation takes a little more than half as long as the loop that uses the simple foreach, so the desired performance improvement is realized.  And most importantly, the scores produced by the algorithm are the same.

    When added parallelization to applications, it is important to verify the results to be sure that no unexpected side effects are introduced.  For example, a similar loop to that shown above which uses a Term Frequency-Inverse Document Frequency comparison instead of a Levenshtein Distance comparison does NOT produce the same results when Parallel.ForEach is introduced in the same manner.  Not all operations lend themselves to parallel processing, so it very important to verify the outputs before and after introducing paralleization!