andreas.wpv: the script: 8 different user agents and how sites deal with it

Wednesday, June 24, 2015

the script: 8 different user agents and how sites deal with it

User agent analysis script

And mentioned in the earlier post - a script helped me to grab the info on this post on how sites and google specifically treat various browsers.

While there's a lot more to analyse, much of it manually, I wanted to first see if there is an indication of differences - so for first insight I use just a plain wc -l to get characters, words, lines of the response, and it looks like there is a clear pattern.

So, let's take a look at the source, two nested "read " loops. The outer loop through the urls, the inner loop through the agents:

#check if the file exists
if [[ ! -e $1 ]]; then
echo -e "there's no file with this name"
fi

outfile=$RANDOM-agentdiff.txt
echo -e "agent \t url \t bytes \t words \t lines" > $outfile

# add a http to urls that don't have it
while read -r line; do

if [[ $line == http://* ]]; then
newline="$line" else
newline="http://$line"

# loop through agents. then read output into variables with read "here" <<<

while read -r agent; do

read filelines words chars <<< $(wget -O- -t 1 -T 3 --user-agent "$agent" "$newline" 2>&1| wc)

echo -e "$agent \t $line \t $filelines \t $words \t $chars" >> $outfile
done < $2
fi
done < $1
wc -l $outfile

Most difficult part was to get the wc output into separate variables, thanks stackexchange for the tip with the <<< here string.

andreas.wpv

Wednesday, June 24, 2015

the script: 8 different user agents and how sites deal with it

User agent analysis script

No comments:

Post a Comment