Wednesday, June 24, 2015

the script: 8 different user agents and how sites deal with it

User agent analysis script

And mentioned in the earlier post - a script helped me to grab the info on this post on how sites and google specifically treat various browsers.

While there's a lot more to analyse, much of it manually, I wanted to first see if there is an indication of differences - so for first insight I use just a plain wc -l to get characters, words, lines of the response, and it looks like there is a clear pattern. 

So, let's take a look at the source, two nested "read " loops. The outer loop through the urls, the inner loop through the agents:

#check if the file exists
if [[ ! -e $1 ]]; then
 echo -e "there's no file with this name"
fi
outfile=$RANDOM-agentdiff.txt
echo -e "agent \t url \t  bytes \t words \t lines" > $outfile
# add a http to urls that don't have it
while read -r line; do

if [[ $line == http://* ]]; then
newline="$line" else
newline="http://$line"
#  loop through agents. then read output into variables with read "here" <<<
          while read -r agent; do
               read filelines words chars <<< $(wget -O- -t 1 -T 3 --user-agent "$agent" "$newline"  2>&1| wc)
         echo -e "$agent \t $line \t $filelines \t $words \t $chars" >> $outfile
done < $2
fi
done < $1
wc -l $outfile
Most difficult part was to get the wc output into separate variables, thanks stackexchange for the tip with the <<< here string. 

No comments:

Post a Comment

Bookmark and Share