Wednesday, July 6, 2016

Do spammers use mbox A/B testing or multivariate testing more than other sites?

A/B testing, multivariate testing and SEO

Many companies use verious products for a/b and or multivariate testing, perhaps even for personalization.  If testing is ok, would a spammer not use a variant testing tool to cloak content for Google?

SEOs know that bots or crawlers should not be served different content then users, especially when this is done based on cookies or user agent ('cloaking'). My understanding of the Google position on testing is that it is good for sites and for usability, and as long as it is limited in scope and run-time, it 'should' be ok. That also means, if too long, too much, too many pages affected, it is not - and perhaps not even short term, small scope.

Can using a testing tool hurt our rankings in Google? 

For the research, I analysed sites using a specific testing tool that adds elements in an 'mbox' on page; it is one of the larger tools capable of large scale implementations. If a larger percentage of spammers would use the tool, it could indicate that variate testing tools might be used for cloaking (assuming spammers measure impact and adjust. Excluding other tools for now. )

Spammers vs other sites: use of variate testing tools

  • A full 30% of the top 1000 list (with 200 status) have an mbox on their homepage
  • only 7 % of the last 1000 from the Alexa 1 million
  • The spammer list showed 24 sites that have an mbox on the homepage from a total of 528 domains, about 4.4 % of the suspect spam list.

How to use this result

Even with spammers using mboxes, this does NOT indicate that the tool is used for spam for several reasons! Sites on the list might be not-spam sites, sites might not use the testing tool for spamming but for legit reasons, or at even not at all although they have an mbox element on their site, for example with self-made Js. Lastly, if the tool would be a good tool for spammers to use, the usage of mboxes would likely show higher than average, but it is significantly lower.

The resulting list is still interesting as a selection of sites that deserve more scrutiny - a manual deep dive to learn about the various uses of the mbox tool for A/B or multivariate testing. 

Process - how to replicate this test

First I pulled the Alexa 1 million  list, split out the top 1000 sites, then the last 1000. Then I looked for downloadable list of spam domains, as I could not find a list of sites know for cloaking, and this one looked pretty good. It is just the list of hosts they consider spam for their site, but as a first test that's good enough for now.
Then I downloaded all elements of the homepage (spanning hosts for scripts from other subdomains and similar), checked with a small script if an mbox was integrated in any of the files downloaded with the homepage. To calculate the percentage of mbox sites for each group I discounted the sites not delivering a 200 OK.

If you have a better spam domain list or even domains known for cloaking, please share. 

