34039 urls tested from the 'top internet sites homepages' list, merging Majestic, Alexa, Statvoo, OpenDNS and Quantcast top million sites. Checking only the http://www homepage of all domains in this list.
24407 - pages have jquery in the source code (72%)
12187 - are using schema in one form or another (36%)
114 - sites had a 'nositesearchbox' (0.3%)
The percentage of sites in the top list and the sites with nosearchbox were the items I was particularly interested in, the jquery info is a nice added bonus.
----------------
The script crawls urls in a file, stores it in a variable, and then tests if any of the three terms given appears in the variable, counts it and lists it in a file:
(only parts shown)
while read -r line; do
acount=111; bcount=111; ccount=111
feedback=$(curl -L -s -m "$time_out" -b cookies -c cookies -A "$agent" "$line")
if [[ $feedback ]] ; then
acount=$(echo "$feedback" | grep -i -c "$3")
bcount=$(echo "$feedback" | grep -i -c "$4")
ccount=$(echo "$feedback" | grep -i -c "$5")
fi
[[ $acount -gt 0 ]] && [[ $acount -ne 111 ]] && acounter=$(( $acounter+1))
[[ $bcount -gt 0 ]] && [[ $bcount -ne 111 ]] && bcounter=$(( $bcounter+1))
[[ $ccount -gt 0 ]] && [[ $ccount -ne 111 ]] && ccounter=$(( $ccounter+1))
echo -e "$line\t$acount\t$bcount\t$ccount" | tee -a $outfile