Parsing Raw Data and Log Files (Part 1)

by May 27, 2021

Most raw log files come in tabular form: even though they may not be full-featured CSV format, they typically have columns and some sort of delimiter, and sometimes even a header.

Here is an example taken from an IIS log. When you look around, plenty of log files organize their data fundamentally in a tabular way like this:

#Software: Microsoft Internet Information Services 10.0
#Version: 1.0
#Date: 2018-02-02 00:03:04
#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken
2021-05-02 00:00:04 10.10.12.5 GET /Content/anonymousCheckFile.txt - 8530 - 10.22.121.248 - - 200 0 0 0
2021-05-02 00:00:04 10.10.12.5 GET /Content/anonymousCheckFile.txt - 8531 - 10.22.121.248 - - 200 0 0 2

Rather than writing complex code to read and parse the data of your log files, it can be of great value to compare your log files to the standard CSV format and see whether it could be treated as such.

In the example of IIS logs above, the findings would be:  

  • Delimiter is a space (not a comma)
  • Fields (column names) are documented in a comment line (not in a header line)

Once you know this, use Import-Csv, the fast built-in PowerShell parser for CSV, to quickly parse and turn your log file into objects. All you need to do is to tell Import-Csv where your log file differs from the standard CSV format features:

$Path = "c:\logs\l190202.log"

Import-Csv -Path $Path -Delimiter ' ' -Header date, time, s-ip, cs-method, cs-uri-stem, cs-uri-query, s-port, cs-username, c-ip, csUser-Agent, csReferer, sc-status ,sc-substatus, sc-win32-status, time-taken 

In this example, use -Delimiter to tell Import-Csv that the delimiter is a space, and since there are no headers defined, use -Header and paste in the header names found in one of the log file comments at the beginning.  

If you don’t know the header names of your log, simply come up with a string array, or use this parameter:

Import-Csv -Header (1..50)

This would assign numbers to the columns of your log file.


Twitter This Tip! ReTweet this Tip!