andreas.wpv: February 2016

Tuesday, February 16, 2016

site search box in SERP

Many of us site managers and tech Seo will have thought about using the sitesearch link box in Google results. When Google displays expanded results with sub-results, they offer to integrate a sitesearch box. Posts on this are rare, and have little info on the impact of these. What I have seen so far indicates that there's no significant impact on natural search traffic. Information about on site conversion (closer match should increase on-site conversion) is not published to my knowledge, and would be hard to separate out from the overall search.

There are three options to deal with this Google offer:
1. ignore, do nothing
2. block the searchbox actively
3. integrate schema into the homepage of the site to 'invite' Google to add the box.

Which route to go? A first step, as often, is to look at how common it is to use one of these options. Working on one of the largest websites, I compare here with the top 10,000 sites by traffic as estimated by Alexa (top million sites report).

Now showing:

how many block, use schema, do nothing,
average rank of blocked sites compared to sites with schema
what do the top sites do (surprise)
top sites blocked by rank
top sites with schema by rank
bash script to test

1. How many block, use schema, do nothing

2. Average rank of blocked sites compared to sites with schema

3. What do the top sites do (surprise)

4. Top sites blocked by rank

5. Top sites with schema by rank

6. Bash script to test

while read -r line; do
if [[ $line != www* ]]; then
line="www.${line}"
fi
output=$(curl -s "${line}" )
nobox=$(echo "$output" | grep "nosite")
boxschema=$( echo "$output" | grep "SearchAction")

if [[ $nobox != "" ]] ; then
nobox="blocked"
fi

if [[ $boxschema != "" ]] ; then
boxschema="schema"
fi

echo -e "$line \t $nobox \t $boxschema" | tee -a nobox-or-box.txt

done < $1

Thursday, February 4, 2016

Canonical checker

For search, a header tag can be very important. We have many areas with duplicate pages, or at least very similar pages. That's bad news for ranking in external search engines as the 'rank equity', or 'ranking love' like some would say, is diluted.
For many situations this rel canonical tag can be used. It indicates to search engines, which of the many urls for the same content should rank in search results. It is one of the well understood and well working elements in seo.

Unfortunately though, some webpages don't make full use of this. Most tools, internally and externally, show if a canonical is on a page, but they don't check anything else, particularly not if the canonical is self-referring to the url of the page on which it is implemented.
These self-referring canonicals are nothing bad, even recommended, but if we have 3 urls for the same content (think segments or customer segments) all have a self referring canonical we need to change this to one having the self referring, and the other two refer to that url with the self-referring one.

Knowledge is the parent of action or so, right?

Here is a little tool to check in your area, works under linux and also on cygwin. Suggestions for improvement welcome as always.

#bash
echo -e "\033[91m if url in feeding file had NO http or https, http is assumed and set \033[0m"
if ! [[ -f $1 ]]; then
echo "need a file with urls starting with http or https"
return
fi

echo -e "URL tested\tcanonical status"
while read line; do
if [[ ! $line == http* ]]; then
line="http://"$line
fi
response=$(curl -I -s --write-out %{http_code} "$line" -o /dev/null)
if [[ $response == 200 ]]; then
canonical=$( curl -s --max-time 3 "$line" | grep "canonical" | grep -o '<.*canonical.*>' | sed -e 's/<link //' -e 's/rel=//' -e 's/canonical//' -e 's/href=//' -e 's/ //g' -e 's/\x22//g' -e 's/\x27//g' -e 's/\/>$//' )
case $canonical in
*hreflang*)
canonical="coding error"
;;
"")
canonical="none"
;;
*/)
canonical="${canonical%/}"
;;
*)
canonical="${canonical}"
esac
if [[ $line == $canonical ]]; then
canonical="\033[32mmatch\033[0m"
fi
else canonical="redirected"
fi
echo -e "$line\t$canonical" | tee -a output.txt

done < $1