Internet proxy log analysis preprocessing

Sat, 28 May 2011 01:01PM

Proxy logs need a bit of work done to them before you can start analysing the content. This is of course assuming you don't have a fancy product to do all this work for you ;). First, you need to work out the regular expression that defines a line in the proxy log to parse it into a nicer format such as CSV. A lot of the CSV columns can probably be removed; the most useful columns are URL, date & time, user agent string (to work out what browser the user was using for example) and request status code (to work out if the user was able to access the content or if it was blocked, unavailable etc).

