Thursday, October 31, 2013

Check the number of Google Plus ripples of the urls in your sitemap

Want to see how many ripples a page has? The easiest way to check all your pages is to use your sitemap.xml. This only works for public shares or public ripples, as Google does not show any number or information about not public shares, and it only shows ripples for 'regular' pages, not for Google plus posts - but who has them in their sitemap.xml anyway.

This little tool has three files, below the script to take a look, behind the links are the source files for linux bash:

  1. The script to pull the urls from the sitemap
  2. the script to get the ripples for each url and store url and number of public ripples
  3. the script combining both.
I made this into three scripts because I use sitemaps for several things, and using these moduls it is easier to reuse parts - like the script to pull urls from an xml sitemap.

1. Wget urls from sitemap and clean up to keep only urls:

#!bash
if [[ ! $1 ]]; then echo 'call with parameter of url for file'
exit 1
else
filename=(output-${RANDOM})
wget -qO- "${1}"  | grep "loc" | sed -e 's/^.*//g' -e 's/<\/loc>.*$//g' > ${filename}
#echo $filename
fi
2. Loop through the url list, then load the page showing ripples. Grep the right line, isolate the part with the number, and then store url and number in a csv. 
#!bash
while read -r line ;
do
number=0
re='[0-9]+'
pull="https://plus.google.com/ripple/details?url=${line}"
number=$(wget -qO- "${pull}" | grep -o "[0-9]*\s*public\s*shares.<" | sed "s/[^0-9]//g"  | tr "\n" "\t" | sed 's/\thttp/\nhttp/g')
if [[ $number =~ $re ]]; then
value=${number}
else
value="0"
fi
echo -e "$line\t$value" >> ${1}.csv
done < ${1}
3. For easy work, use this script to call above scripts for getting ripples for all urls in your sitemap in the right order. One command, all done. 

#!bash
if [[ ! $1 ]]; then
echo 'need input xml file'
else
source get-urls-from-sitemap $1
wait ${!}
source loop-through-sitemap ${filename}
wait ${!}
cat ${filename}.csv
fi

It might not be the easiest way to do this, but it works just fine. Please feel free to suggest improvements. I tested this only on Ubuntu 12.04.

No comments:

Post a Comment

Bookmark and Share