Monday, March 16, 2015

Automate your Logfile analysis for SEO with common tools

Fully automated log  analysis with tools many use all the time

Surely no substitute for splunk and its algorithms and features, but very practical, near zero cost (take that!)  and high efficiency. Requires mainly free tools (thanks cygwin) or standard system tools (like wiindows task scheduler), plus a bit of trial and error.  (I also use MSFT Excel, but other spreadsheet programs should work as well).

Analysis of large logfiles, daily

Analyzing logfiles for bot and crawler behavior, but also to check for site quality is quite helpful. So, how to analyze our huge files? For a part of the site, we're talking about many GB of logs, even zipped.

Not that hard, actually, although it took me a while to get all these steps lined up and synchronized.

With the windows task manager I schedule a few steps over night:
  • copy last days logfiles on a dedicated computer
  • grep the respective entries in a variety of files (all 301, bot 301, etc.)
  • Then count the file lenghts (wc -l ) and append the values to a table (csv file) tracking these numbers
  • Delete logfiles
  • The resulting table and one or two of the complete files (all 404.txt) are copied to a server, which hosts an Excel file with uses the txt file as database, and updates graphs and tables on open.
  • delete temporary files (and this way avoid the dip you see)

Now our team can go quickly check if we have an issue up, and need to take a closer look, or not.
In a second step I also added all log entries resulting in a 404 into the spreadsheet on open.


Thursday, March 5, 2015

Who has H1?

Do we need an H1 on our homepage?

Sometimes it is necessary to convince pdms, devs, stakeholders that SEO efforts are necessary. One way to support this is to quickly  run a test on competitors and / or on ... top pages on the web. (Yes, after all the other pro's have been given.) Especially since we're running one of the top pages ourselves, that list contains powerful names.(and the first url is in my tests.. because I know what's happening there, not because of being in the top 1million.. yet ;-) )

So, H1 or not?

Out of the top 1000 pages, about
Here is a screenshot of the top pages that have an H1 on their homepage.

This is the script, running over the top 1000 urls from alexa 1 million.  Very easy to adjust for other page elements.

echo -e "url\t has H1 " > 2top1kH1.txt
while read -r line; do
echo $line
h1yes=$(wget -qO- -w 3 -t 3 "$line" | grep "<h1" | wc -l)
if [ "$h1yes" -gt 0 ]; then
echo -e "$line\t yes" >> 2top1kH1.txt
done < $1

Not large, not complicated, but very convincing.
Bookmark and Share