Log Parser – Transforming Plain Text Files

This post describes how to solve a specific problem with Microsoft’s Log Parser tool.  For background on the tool (and lots of examples), start here.

The Problem

Given a file named MyLog.log that looks like this…

ip= date=20160620 time=06:00:00 device=A23456789 log=00013
ip= date=20160621 time=06:00:01 device=A13456789 log=00014
ip= date=20160622 time=06:00:02 device=A12456789 log=00015
ip= date=20160623 time=06:00:03 device=A12356789 log=00016
ip= date=20160624 time=06:00:04 device=A12346789 log=00017
ip= date=20160625 time=06:00:05 device=A12345789 log=00018
ip= date=20160626 time=06:00:06 device=A12345689 log=00019
ip= date=20160627 time=06:00:07 device=A12345679 log=00020
ip= date=20160628 time=06:00:08 device=A12345678 log=00021
ip= date=20160629 time=06:00:09 device=A123456789 log=00022

…transform it into a tab-separated file with a header row.  Each field should include only the field value (and not the field name).

Notice that the original file has no header, the fields are separated with spaces, and the field name is part of each field (i.e. "ip=").

The Solution

Step 1)

logparser -i:TSV -iSeparator:space -headerRow:OFF
     "select * into ‘MyLogTemp.log’ from ‘MyLog.log’"
     -o:TSV -oSeparator:space -headers:ON

In this command, -i:TSV -iSeparator:space informs Log Parser that the input file is a space-separated text file, and -headerRow:OFF lets Log Parser know that the file has no headers.  Likewise, -o:TSV -oSeparator:space -headers:ON tells Log Parser to output a space-separated text file with headers.

This produces a file named MyLogTemp.log with the following content:

Filename RowNumber Field1 Field2 Field3 Field4 Field5
MyLog.log 1 ip= date=20160620 time=06:00:00 device=A23456789 log=00013
MyLog.log 2 ip= date=20160621 time=06:00:01 device=A13456789 log=00014
MyLog.log 3 ip= date=20160622 time=06:00:02 device=A12456789 log=00015
MyLog.log 4 ip= date=20160623 time=06:00:03 device=A12356789 log=00016
MyLog.log 5 ip= date=20160624 time=06:00:04 device=A12346789 log=00017
MyLog.log 6 ip= date=20160625 time=06:00:05 device=A12345789 log=00018
MyLog.log 7 ip= date=20160626 time=06:00:06 device=A12345689 log=00019
MyLog.log 8 ip= date=20160627 time=06:00:07 device=A12345679 log=00020
MyLog.log 9 ip= date=20160628 time=06:00:08 device=A12345678 log=00021
MyLog.log 10 ip= date=20160629 time=06:00:09 device=A123456789 log=00022

This hasn’t done much.  In fact is has added some stuff that is not relevant (the Filename and RowNumber columns), while leaving field names in each fields and maintaining the space field separator.  However, it HAS added headers (Field1, Field2, ect), which are needed for the second step.

Step 2)

logparser -i:TSV -iSeparator:space -headerRow:ON
     "select REPLACE_STR(Field1, ‘ip=’, ”) AS ip,
               REPLACE_STR(Field2, ‘date=’, ”) AS date,
               REPLACE_STR(Field3, ‘time=’, ”) AS time,
               REPLACE_STR(Field4, ‘device=’, ”) AS device,
               REPLACE_STR(Field5, ‘log=’, ”) AS log
     into ‘MyLogTransformed.log’
     from ‘MyLogTemp.log’"
     -o:TSV -oSeparator:tab -headers:ON

The input and output specifications in this command are similar to those in Step 1, except here the input file has headers (-headerRow:ON) and the output file is tab-separated (-oSeparator:tab) instead of space-separated.  The main difference is in the SELECT statement itself, where the use of the REPLACE_STR function removes the field names from the field values and the AS statement assigns the desired headers to each column of data.  Notice that the REPLACE_STR function uses the headers that were added in Step 1.

This produces the final result in a file named MyLogTransformed.log:

ip     date     time     device     log     20160620     06:00:00     A23456789     00013     20160621     06:00:01     A13456789     00014     20160622     06:00:02     A12456789     00015     20160623     06:00:03     A12356789     00016     20160624     06:00:04     A12346789     00017     20160625     06:00:05     A12345789     00018     20160626     06:00:06     A12345689     00019     20160627     06:00:07     A12345679     00020     20160628     06:00:08     A12345678     00021     20160629     06:00:09     A123456789     00022

More Information

See Log Parser’s built-in help for additional explanations of the Log Parser features used in the solution.  In particular, look at the following:

logparser -h
logparser -h -i:TSV
logparser -h -o:TSV

Estimating SQL Server Table Sizes

Microsoft gives guidance for calculating the estimated sizes of your tables and indexes at http://msdn.microsoft.com/en-us/library/ms175991.aspx.  Performing such estimates is an important step in determining required server capacity.

Unfortunately, Microsoft’s guidance consists of long textual descriptions of the steps that need to be followed to perform the estimations.  It takes some time to interpret the instructions given by Microsoft and perform the calculations.

With that in mind, I put together a spreadsheet to help me perform the calculations.  To use the spreadsheet, I simply fill out the spreadsheet with the total number of rows in the table and details of the columns (data types and sizes) in the table and/or index.  In my experience, the estimates provided by the spreadsheet are within 5-10% of the actual sizes of the tables/indexes, which has been good enough for my needs.

You can download the spreadsheet here.

To complete the spreadsheet, simply enter the requested data into the cells with black borders.  For example, enter the expected number of rows in the table in the cell labeled “Rows in Table’”.  Enter the number of columns in the table in the cell labeled “# of Columns”.  Continue until you have specified all of the information about the table or index.  The spreadsheet uses the information to calculate the total space needed for the table or index.

To help you get started, the spreadsheet includes sample data for a table that contains 1699488 rows and has six columns.  This table has two integer columns (one of which is the primary key), two datetime columns, one tinyint column, and one nvarchar column.  The average size of the data in the nvarchar column is 12 characters.  Compare these table specifications to the sample data in the spreadsheet for an example of how to map the specifications for your table/index into the spreadsheet.

Note that while Microsoft provides guidance for estimating sizes of tables, clustered indexes, and non-clustered indexes, this spreadsheet only handles tables and clustered indexes.  I originally intended to add the calculations for non-clustered indexes, but simply did not find the time.  So, I am providing the spreadsheet as-is.  It should, at the very least, provide a good starting point for estimating table and index sizes.

St. Louis Day of .NET 2011 – Reflections and Session Materials

The annual St. Louis Day of .NET conferences was held August 5th and 6th.  It was the largest yet, with over 800 attendees. 

The keynote address, given by Microsoft’s Jay Schmelzer, focused on Visual Studio Lightswitch.  Lightswitch is billed as a simplified development tool for creating business applications.  I have read comparisons to Microsoft Access, and from what I saw at the keynote, those comparisons are apt.  I’ll give more detail about the keynote and the other sessions that I attended in future posts.

Interestingly, I did not hear anyone talking about Lightswitch after the keynote.  I don’t think I’ve ever been at a conference where there was no further discussion of the keynote.

Among the sessions that I attended were three focusing on MVC… none were great, or even particularly good.  This was very disappointing, as it was a topic of particular interest to me.  On the other hand, sessions on Entity Framework 4.1, SOLID design principles, Powershell, Inversion of Control containers, and jQuery Plug-ins were good.  Two speakers that particularly impressed me were Steve Bohlen (SOLID and Inversion of Control) and Muljadi Budiman(Objective-C).

I found it interesting (but not surprising) that sessions on Objective-C and jQuery were filled to overflowing, despite their focus on non-Microsoft technology and/or vendor-agnostic technologies.  I was happy to see that the developer community, while supportive of Microsoft, isn’t blindly supportive.

One last note… there were a large number of sponsors of this year’s event, and I was impressed by the number that were hiring or looking for contract workers. One hopes that this bodes well for developers looking for work in the St. Louis area.

As far as I know, the conference web site (http://www.stlouisdayofdotnet.com/2011/) doesn’t provide a way for speakers to deposit their session materials.  Because of that, slides and sample code have been slowly trickling out on personal blogs and web sites.  I have been attempting to accumulate as many links to those materials as I can find, and those links are listed below.  Full session descriptions can be found here.

Application Development With HTML5

Architecting Applications the Microsoft Way

Architecting for Massive Scalability

Building jQuery Plugins

Building Office 365 Web Parts

Common UX Pitfalls

Consuming Data From Many Platforms: The Benefits of OData

Customizing BlogEngine.NET

Design Quality: Learning From the Mistakes of the US Auto Industry

DotNetNuke Jumpstart

Embrace the Buzz – Building Modern DotNetNuke Extensions

Get Funcy With C# and F#

IE9 Pinned Sites

Introduction to User Experience Methods

Javascript is not C#

Jumpstart: SharePoint Web Part Development

Keynote: Lightswitch

Lightswitch Jumpstart

The “Little Pitfalls” of C#/.NET

The “Little Wonders” of C#/.NET

Making $$$ with Windows Phone 7

The Making of the Day of .NET Web Site with Orchard CMS

Overview of Windows Azure Marketplace DataMarket

Powershell: Not Your Father’s Command Line

Reactive Extensions (Rx) in .NET

Refactoring to a SOLID Foundation

Startup Tips and Tricks

Taming Your Dependencies With Inversion of Control Containers

The Three Tasty Flavors of Entity Framework 4.1

Unit Testing Patterns and Anti-Patterns

What’s Wrong With This Picture?

Why You Should Care the Sony PSN Was Hacked