Monday, March 16, 2015

Automate your Logfile analysis for SEO with common tools

Fully automated log  analysis with tools many use all the time

Surely no substitute for splunk and its algorithms and features, but very practical, near zero cost (take that!)  and high efficiency. Requires mainly free tools (thanks cygwin) or standard system tools (like wiindows task scheduler), plus a bit of trial and error.  (I also use MSFT Excel, but other spreadsheet programs should work as well).

Analysis of large logfiles, daily

Analyzing logfiles for bot and crawler behavior, but also to check for site quality is quite helpful. So, how to analyze our huge files? For a part of the site, we're talking about many GB of logs, even zipped.

Not that hard, actually, although it took me a while to get all these steps lined up and synchronized.

With the windows task manager I schedule a few steps over night:
  • copy last days logfiles on a dedicated computer
  • grep the respective entries in a variety of files (all 301, bot 301, etc.)
  • Then count the file lenghts (wc -l ) and append the values to a table (csv file) tracking these numbers
  • Delete logfiles
  • The resulting table and one or two of the complete files (all 404.txt) are copied to a server, which hosts an Excel file with uses the txt file as database, and updates graphs and tables on open.
  • delete temporary files (and this way avoid the dip you see)

Now our team can go quickly check if we have an issue up, and need to take a closer look, or not.
In a second step I also added all log entries resulting in a 404 into the spreadsheet on open.


No comments:

Post a Comment

Bookmark and Share