User agent analysis script
And mentioned in the earlier post - a script helped me to grab the info on this post on how sites and google specifically treat various browsers.
While there's a lot more to analyse, much of it manually, I wanted to first see if there is an indication of differences - so for first insight I use just a plain wc -l to get characters, words, lines of the response, and it looks like there is a clear pattern.
So, let's take a look at the source, two nested "read " loops. The outer loop through the urls, the inner loop through the agents:
#check if the file exists
if [[ ! -e $1 ]]; then
echo -e "there's no file with this name"
fi
outfile=$RANDOM-agentdiff.txt
echo -e "agent \t url \t bytes \t words \t lines" > $outfile
# add a http to urls that don't have it
while read -r line; do
if [[ $line == http://* ]]; then
newline="$line" else
newline="http://$line"
# loop through agents. then read output into variables with read "here" <<<
while read -r agent; do
read filelines words chars <<< $(wget -O- -t 1 -T 3 --user-agent "$agent" "$newline" 2>&1| wc)
echo -e "$agent \t $line \t $filelines \t $words \t $chars" >> $outfile
done < $2
fi
done < $1
wc -l $outfile
Most difficult part was to get the wc output into separate variables, thanks stackexchange for the tip with the <<< here string.
No comments:
Post a Comment