Tuesday, April 7, 2015

Including Google: 8 agents - and average response code


Agents, not spies

Agents, user agents, play an important role in the setup and running of sites, likely more than many developers like. Exceptions, adjustments, fixes - and it is (?) impossible to generate code that shows up properly on all user agents and also is W3C compliant. 

So, sites care, and likely Google cares as well. Google - like any site - offers only one interface to their information (search): source code. How the page looks like, how it is rendered, the tool that renders it is not Google, not provides or supported or maintained or owned by Google, but something on client side. Not even the transport is provided by Google. Google only provides a stream of data via http , https, and anything after that is not their business. 

So, my question was - do user-agents make a difference? 

And sure, what better to use than a little script? There are so many different user agents, I wanted to keep this open to changes, so I loop over a file with user agents - one per line - as well over a list of urls to get the response from these sites.

Surprising results: Bingbot and Googlebot receive the smallest files - not wget or a unknown bot. The graph overstates this, but the table shows there's a clear difference. 




Google.com answering

Now let's take a look at the response Google.com sent back to the little script. They do NOT like wget or andi-bot, nor google bot or bingbot, although they perform a bit better than the first two.
High focus on the regular user agents. The picture for words is the same as for lines.

Google only provides a datastream, rendering is on client side. So if I use firefox, IE, safari, wget or curl is my business - and not Google's.  I understand that google does not want people to scrape large amounts of data off their site - which i don't do - but I am surprised the 'block' is not a bit more sophisticated.



No comments:

Post a Comment

Bookmark and Share