Log Parser – Transforming Plain Text Files

This post describes how to solve a specific problem with Microsoft’s Log Parser tool.  For background on the tool (and lots of examples), start here.

The Problem

Given a file named MyLog.log that looks like this…

ip=0.0.0.0 date=20160620 time=06:00:00 device=A23456789 log=00013
ip=0.0.0.1 date=20160621 time=06:00:01 device=A13456789 log=00014
ip=0.0.0.2 date=20160622 time=06:00:02 device=A12456789 log=00015
ip=0.0.0.3 date=20160623 time=06:00:03 device=A12356789 log=00016
ip=0.0.0.4 date=20160624 time=06:00:04 device=A12346789 log=00017
ip=0.0.0.5 date=20160625 time=06:00:05 device=A12345789 log=00018
ip=0.0.0.6 date=20160626 time=06:00:06 device=A12345689 log=00019
ip=0.0.0.7 date=20160627 time=06:00:07 device=A12345679 log=00020
ip=0.0.0.8 date=20160628 time=06:00:08 device=A12345678 log=00021
ip=0.0.0.9 date=20160629 time=06:00:09 device=A123456789 log=00022

…transform it into a tab-separated file with a header row.  Each field should include only the field value (and not the field name).

Notice that the original file has no header, the fields are separated with spaces, and the field name is part of each field (i.e. "ip=").

The Solution

Step 1)

logparser -i:TSV -iSeparator:space -headerRow:OFF
     "select * into ‘MyLogTemp.log’ from ‘MyLog.log’"
     -o:TSV -oSeparator:space -headers:ON

In this command, -i:TSV -iSeparator:space informs Log Parser that the input file is a space-separated text file, and -headerRow:OFF lets Log Parser know that the file has no headers.  Likewise, -o:TSV -oSeparator:space -headers:ON tells Log Parser to output a space-separated text file with headers.

This produces a file named MyLogTemp.log with the following content:

Filename RowNumber Field1 Field2 Field3 Field4 Field5
MyLog.log 1 ip=0.0.0.0 date=20160620 time=06:00:00 device=A23456789 log=00013
MyLog.log 2 ip=0.0.0.1 date=20160621 time=06:00:01 device=A13456789 log=00014
MyLog.log 3 ip=0.0.0.2 date=20160622 time=06:00:02 device=A12456789 log=00015
MyLog.log 4 ip=0.0.0.3 date=20160623 time=06:00:03 device=A12356789 log=00016
MyLog.log 5 ip=0.0.0.4 date=20160624 time=06:00:04 device=A12346789 log=00017
MyLog.log 6 ip=0.0.0.5 date=20160625 time=06:00:05 device=A12345789 log=00018
MyLog.log 7 ip=0.0.0.6 date=20160626 time=06:00:06 device=A12345689 log=00019
MyLog.log 8 ip=0.0.0.7 date=20160627 time=06:00:07 device=A12345679 log=00020
MyLog.log 9 ip=0.0.0.8 date=20160628 time=06:00:08 device=A12345678 log=00021
MyLog.log 10 ip=0.0.0.9 date=20160629 time=06:00:09 device=A123456789 log=00022

This hasn’t done much.  In fact is has added some stuff that is not relevant (the Filename and RowNumber columns), while leaving field names in each fields and maintaining the space field separator.  However, it HAS added headers (Field1, Field2, ect), which are needed for the second step.

Step 2)

logparser -i:TSV -iSeparator:space -headerRow:ON
     "select REPLACE_STR(Field1, ‘ip=’, ”) AS ip,
               REPLACE_STR(Field2, ‘date=’, ”) AS date,
               REPLACE_STR(Field3, ‘time=’, ”) AS time,
               REPLACE_STR(Field4, ‘device=’, ”) AS device,
               REPLACE_STR(Field5, ‘log=’, ”) AS log
     into ‘MyLogTransformed.log’
     from ‘MyLogTemp.log’"
     -o:TSV -oSeparator:tab -headers:ON

The input and output specifications in this command are similar to those in Step 1, except here the input file has headers (-headerRow:ON) and the output file is tab-separated (-oSeparator:tab) instead of space-separated.  The main difference is in the SELECT statement itself, where the use of the REPLACE_STR function removes the field names from the field values and the AS statement assigns the desired headers to each column of data.  Notice that the REPLACE_STR function uses the headers that were added in Step 1.

This produces the final result in a file named MyLogTransformed.log:

ip     date     time     device     log
0.0.0.0     20160620     06:00:00     A23456789     00013
0.0.0.1     20160621     06:00:01     A13456789     00014
0.0.0.2     20160622     06:00:02     A12456789     00015
0.0.0.3     20160623     06:00:03     A12356789     00016
0.0.0.4     20160624     06:00:04     A12346789     00017
0.0.0.5     20160625     06:00:05     A12345789     00018
0.0.0.6     20160626     06:00:06     A12345689     00019
0.0.0.7     20160627     06:00:07     A12345679     00020
0.0.0.8     20160628     06:00:08     A12345678     00021
0.0.0.9     20160629     06:00:09     A123456789     00022

More Information

See Log Parser’s built-in help for additional explanations of the Log Parser features used in the solution.  In particular, look at the following:

logparser -h
logparser -h -i:TSV
logparser -h -o:TSV
logparser -h FUNCTIONS REPLACE_STR

More Log Parser Resources

I’ve previously blogged about my favorite tool for IIS log analysis, Log Parser.

You can see my previous post here.  At the end of that post I list links to additional Log Parser discussion and examples.  Today I found a couple more resources that belong on that list:

  • This one, from the Nuttin but Exchange blog, is similar to what I had posted, but touches on some additional options and functions that I did not describe.
  • LogParserPlus.com is a web site devoted specifically to Log Parser.  It includes articles, examples, and comprehensive lists of Log Parser expressions and functions.

A GUI for Log Parser?

In a earlier post on this blog, I discussed the Log Parser tool from Microsoft, and provided an extensive list of examples showing how to use the tool.  I also wondered why Log Parser has never gained more traction among developers that use Microsoft’s tools.

Recently I received a comment on that blog entry.  The commenter praised the information I had provided, and then went on to mention a tool that provides a GUI for Log Parser.  (If you are not familiar with Log Parser, it is a command line tool for querying text-based data and Windows data sources such as the event log).

I thought the comment was nice, but wondered about the name-dropping of the GUI tool.  WordPress captures the IP address of commenters, so a quick web query or two later I knew that the person leaving the comment was based in Macedonia.  A glance at the web site for the GUI tool revealed that it is made by a company based in (you guessed it) Macedonia.  The comment was spam.  Polite spam.  But still.  Darn.

I considered dropping the comment, but then had a second thought.  Perhaps the reason that Log Parser never gained more attention is that (sadly) to many Windows-based developers, the command line interface is a form of Kryptonite.  Maybe a GUI is really what Log Parser needed, and maybe the commenter’s tool was worth investigating.  So I decided to give it a test drive and report my impressions here.

Log Parser Lizard: Installation and Overview

The name of the tool is Log Parser Lizard from Lizard Labs.  There is both a free version and a paid version (including a few extra features) of the application.  The listed prerequisites are Log Parser itself (obviously) and version 2.0 of the .NET Framework.  The free version of Lizard Labs Log Parser Lizard can be downloaded from http://www.lizard-labs.net/log_parser_lizard.aspx.  Tutorials and screenshots are also provided there (you might want to refer to those while reading this post).

After downloading the MSI, I started the installation process.  There were no surprises.  It was a standard Windows installation experience.  Specify a few options (where to install, etc), hit go, and it’s done.  A Log Parser Lizard group was added to the start menu; but no shortcuts were added to the desktop or elsewhere.

Upon running Log Parser Lizard the first time I was presented with an About dialog that explained the tool and asked for support in the form of blog posts, feedback, donations, or a purchased license.  Disappointing was the presence of a few grammatical and spelling errors (please register this “peace” of software?  “visit lziard-labs.net”?).  I hate to be the grammar police, but you only get one chance to make a first impression, right?  It’s worth noting that this dialog was the only place in the application where I noticed such errors.

The overall appearance of the tool is nice and clean.  The main UI has a set of panels on the left-hand side of the application labeled as follows: IIS Logs, Event Logs, Active Directory, Log4Net, File System, and T-SQL.  Selecting each of these panels reveals sets of example queries.  There are options along the top of the UI that allow users to manage the queries that appear in these panels. 

The query management option provides a way to collect and manage log parser queries, which is useful.  In my experience using Log Parser I’ve ended up with folders full of saved queries, so it would be nice to have a better way to organize them.  A downside I can see is that queries are stored within the application, and I did not find a way to export in the queries in bulk.  I imagine this would complicate moving saved queries to a second computer.

When a new query is created, or a existing query is opened (either from a query managed within the application or from an external file), additional options appear for specifying the format of the input file (examples are “IIS W3C Log”, “HTTP Error Log”, and ‘”Comma Separated Values”), setting properties of the query, and executing the query.  Query results can be viewed as a grid or a graph, and each of these output formats has multiple options.

There is one more unique feature worth mentioning.  In addition to the query management feature, there is an option to manage “Constants” which can be referenced in queries.  For example, assume that all of my IIS log files reside in a folder named \\SERVER\Share\.  I could set up a constant named “FILEPATH” and assign it a vlaue of \\SERVER\Share\.  Then, the constant could be referenced in queries like this:  “SELECT * FROM #FILEPATH#ex000000.log”.

Building A Query

When I use Log Parser, my most common use case is to query one or more IIS W3C log files.  Typically these files reside on a remote network share.  The biggest challenge I face in using Log Parser is remembering/typing the path to the log files, and remembering/typing the names of the fields to be queried.  It is easy for typos to occur, and it’s difficult to work with long queries on the command line.

So, does Log Parser Lizard help solve my biggest Log Parser challenges?  Not really.  I had expected that the GUI would allow me to browse to and select the files which I wanted to query.  I also had hoped that the tool would present me with a list of the queryable columns in the selected files.  For known file types (like the IIS W3C log format), I expected this would be possible.  Instead, as far as I could see, Log Parser queries have to be constructed largely as they would be at the command line. 

Composing queries in the tool’s UI is more convenient than composing on the command line.  However, there is no way (that I could find) to browse/select files, nor is drag-and-drop of log files possible.  In addition, the column names in the data sources still need to be typed; the tool offers no assistance.  The ability to set up constants to hold file paths and column names is useful, but not exactly the type of help I was hoping to find.

(Note: I did find an “Edit Mode” option in the UI which opens up a panel that appears to provide file browse/selection functionality.  Unfortunately, the “Add” option revealed in the panel was not active.  I could have been doing something wrong, but my assumption is that this is a feature only available to registered users of the tool.)

Executing a Query

To test the tool, I executed a simple query over a week’s worth of IIS logs for a web site on which I am lead developer.  I wanted to see a list of all of the hits on the site’s search page in that week.  Running Log Parser natively produced a 6MB CSV file containing about 11,000 rows.  The query completed in a matter of seconds.  Using Log Parser Lizard, the query took more than 10 minutes to complete.  During much of that time the CPU utilization was at 100% and most of the memory on the machine was being consumed.  Admittedly, the test was run on an underpowered (1GB RAM) virtual machine running Windows XP, but that is a dramatic difference between the command line and the GUI.

More targeted queries that return much smaller result sets seem to perform much better.

Working With Query Results

Once a query is constructed and executed, Log Parser Lizard does a nice job of presenting the results in a format that allows for quick and useful analysis.  Results can be grouped, filtered, and sorted.  Columns can be hidden and/or shown as needed.  Response time is fast, even though the initial querying of the data was quite slow. 

There are several options for outputting query results (such as Excel and PDF), but those are only available to registered users of the tool.

Conclusion

In conclusion, the free version of Log Parser Lizard does a couple things well, and a couple things not so well.  The query management feature provides a nice way to organize Log Parser queries.  The tools for analyzing query results are also solid.  On the other hand, the tool doesn’t do much to simplify construction of Log Parser queries.  And, performance of the tool for queries that return large data sets is poor.

Log Parser Lizard is a solid tool which could use some work in a few areas.  I’m not sure it will make it into my toolbox, but others might find it useful.