Wednesday, June 24, 2015

the script: 8 different user agents and how sites deal with it

User agent analysis script

And mentioned in the earlier post - a script helped me to grab the info on this post on how sites and google specifically treat various browsers.

While there's a lot more to analyse, much of it manually, I wanted to first see if there is an indication of differences - so for first insight I use just a plain wc -l to get characters, words, lines of the response, and it looks like there is a clear pattern. 

So, let's take a look at the source, two nested "read " loops. The outer loop through the urls, the inner loop through the agents:

#check if the file exists
if [[ ! -e $1 ]]; then
 echo -e "there's no file with this name"
echo -e "agent \t url \t  bytes \t words \t lines" > $outfile
# add a http to urls that don't have it
while read -r line; do

if [[ $line == http://* ]]; then
newline="$line" else
#  loop through agents. then read output into variables with read "here" <<<
          while read -r agent; do
               read filelines words chars <<< $(wget -O- -t 1 -T 3 --user-agent "$agent" "$newline"  2>&1| wc)
         echo -e "$agent \t $line \t $filelines \t $words \t $chars" >> $outfile
done < $2
done < $1
wc -l $outfile
Most difficult part was to get the wc output into separate variables, thanks stackexchange for the tip with the <<< here string. 

Thursday, June 4, 2015

Speed: Data on top 1000 header load time vs full load time

Lots of tools give a different number for the speed of a site, how it is for users, over different channels, providers, including rendering time or not, including many elements or not.

This is the 'hardest' test of all:

  • With a little script I checked the top 1000 pages from the Alexa list*. The first script tests how long it takes to get the http header back (yes, document exists, and yes, I know where it is and it is OK), These are the blue dots.
  • The second script downloads ALL page elements of the homepage, including images, scripts, stylesheets, also from integrated third party tools like enlighten, tealeaf, omniture, or whatever a site uses. These are the orange dots.

First I ran this without time limit, and when I checked back the next day, it was still running, so I set the timeout to 20 seconds.

There seems to be a clear connection between header response time and full time, not so much between rank in the top 1000 by traffic and speed.

sorted by full download time

sorted by traffic rank

There also seems to NOT be a clear connection on how rank by traffic (x-axis) correlates to full download time. This shows that we have a great opportunity to outperform many other companies with faster download speeds.

* Alexa (owned by Amazon) publishes the top 1 Million websites by traffic, globally, based on data from  many different browser plugins plus from Amazon (cloud?) services.
Bookmark and Share