A Simple Parallel.ForEach Example

Previously, I wrote about C# implementations of algorithms used to compare a document to a dictionary of documents.  You can see the original article here, and download the source code here.

The main processing loops are similar for each algorithm.  Simply put, strings are compared by looping through each entry in the dictionary and applying a comparison algorithm to the target string and the current dictionary entry. 

Some of these algorithms are slow, particularly with large dictionaries.  In each case, the actual comparison part of the loop is the most expensive operation, and it is performed one-dictionary-entry at a time.  This loop is a prime candidate for parallelization.  Let’s see how this is accomplished with the use of a simple Parallel.ForEach statement.

Here is an example of the main loop for the string comparison algorithms: 

foreach (KeyValuePair<string, string> kvp in Dictionary)
{
    string dictionaryItem = kvp.Value;

      // Compute the Levenstein distance between the target document and the current dictionary item

    int score = lv.GetDistance(document, dictionaryItem);

    // Save the result
    ResolutionResult result = new ResolutionResult();
    result.Key = kvp.Key;
    result.Document = kvp.Value;
    result.Score = score;
    results.Add(result);
}
 

The GetDistance method compares the strings using the Levenshtein Distance algorithm and produces a score that represents how similar the strings are.  That score is then stored in a generic list of ResolutionResult objects.

And here is the same loop, implemented to allow iterations over the loop to run in parallel:

object _lock = new object();
Parallel.ForEach(Dictionary, kvp =>
    {
        string dictionaryItem = kvp.Value;

          // Compute the Levenstein distance between the target doc and the current dictionary item

        int score = lv.GetDistance(document, dictionaryItem);

          // Save the result

        ResolutionResult result = new ResolutionResult();
        result.Key = kvp.Key;
        result.Document = kvp.Value;
        result.Score = score;
        lock (_lock) { results.Add(result); }
    });

The foreach statement is replaced with Parallel.ForEach.  Note that in order to use Parallel.ForEach, a “using” statement for System.Threading.Tasks is required (not shown in the example). 

Other than that, the body of the loop is the same, with one exception.  The results.Add(result) statement is now wrapped in a lock(object) { } statement.  This is because the results list is updated by all iterations of the loop, which now run in parallel.  If the list is not locked during the Add operation, multiple iterations of the loop may try to update the list at the same time, resulting in conflicts and errors.

Tests on a dual core processor show that the loop that uses the Parallel.ForEach operation takes a little more than half as long as the loop that uses the simple foreach, so the desired performance improvement is realized.  And most importantly, the scores produced by the algorithm are the same.

When added parallelization to applications, it is important to verify the results to be sure that no unexpected side effects are introduced.  For example, a similar loop to that shown above which uses a Term Frequency-Inverse Document Frequency comparison instead of a Levenshtein Distance comparison does NOT produce the same results when Parallel.ForEach is introduced in the same manner.  Not all operations lend themselves to parallel processing, so it very important to verify the outputs before and after introducing paralleization!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: