For many situations this rel canonical tag can be used. It indicates to search engines, which of the many urls for the same content should rank in search results. It is one of the well understood and well working elements in seo.
Unfortunately though, some webpages don't make full use of this. Most tools, internally and externally, show if a canonical is on a page, but they don't check anything else, particularly not if the canonical is self-referring to the url of the page on which it is implemented.
These self-referring canonicals are nothing bad, even recommended, but if we have 3 urls for the same content (think segments or customer segments) all have a self referring canonical we need to change this to one having the self referring, and the other two refer to that url with the self-referring one.
Knowledge is the parent of action or so, right?
Here is a little tool to check in your area, works under linux and also on cygwin. Suggestions for improvement welcome as always.
#bash
echo -e "\033[91m if url in feeding file had NO http or https, http is assumed and set \033[0m"
if ! [[ -f $1 ]]; then
echo "need a file with urls starting with http or https"
return
fi
echo -e "URL tested\tcanonical status"
while read line; do
if [[ ! $line == http* ]]; then
line="http://"$line
fi
response=$(curl -I -s --write-out %{http_code} "$line" -o /dev/null)
if [[ $response == 200 ]]; then
canonical=$( curl -s --max-time 3 "$line" | grep "canonical" | grep -o '<.*canonical.*>' | sed -e 's/<link //' -e 's/rel=//' -e 's/canonical//' -e 's/href=//' -e 's/ //g' -e 's/\x22//g' -e 's/\x27//g' -e 's/\/>$//' )
case $canonical in
*hreflang*)
canonical="coding error"
;;
"")
canonical="none"
;;
*/)
canonical="${canonical%/}"
;;
*)
canonical="${canonical}"
esac
if [[ $line == $canonical ]]; then
canonical="\033[32mmatch\033[0m"
fi
else canonical="redirected"
fi
echo -e "$line\t$canonical" | tee -a output.txt
done < $1
superb !
ReplyDeleteThanks Ankit. So much to do :-).
Delete