andreas.wpv: April 2014

Monday, April 28, 2014

Moz' post on 'the greatest misconception in content marketing'

Rand Fishkin from Moz.com knows what he is talking about, and here is the proof for everyone to see. (Disclaimer - I read the transcript)

Let's cite a few core items from Rand, talking about what people and companies do wrong when using blogs for their content strategy:

"So they do a few things that are really dumb. They don't take this piece of content and put links to potentially relevant stuff on their site inside there, and they don't internally link to it well either. So they've almost orphaned off a lot of these content pieces.

You can see many people who've orphaned their blog from their main site, which of course is terrible. They'll put them on subdomains or separate root domains so that none of the link authority is shared between those.

They don't think about sharing through Google+ or building an audience with Google+, which can really help with the personalization. Nor do they think about using keywords wisely. "

So true, and in the past weeks I have found that Moz' strategy really works out well. I have compared several company blogs and news sites ( multiple blogs in magazine form) on how the content performs in social media engagement - sharing on Google Plus, Facebook, twitter, Linkedin.

Then I took the numbers and calculated a 'social share per piece of content'. Now guess who's the blue line?

Exactly, that's Moz' company blog. Rand and his team are doing it right, as far as anyone can see from the outside.
The numbers are impressive:

The second from last row is the moz blog, the other rows are big company content sites. The last row is not integrated into the graph - someone is likely using automation to push their results, as explained here.

I cannot agree more with the Moz article on:

Link from and especially towards the blogposts
integrate the content blog into the on site area for this content (we call it a 'hub')
Engage on Google Plus around this content, constantly.

Wednesday, April 23, 2014

Someone has lots of posts with the same amount of shares, comments, linkedin post - how does this happen?

One of my favorite projects right now is to check how competitors (to our Techpageone) are performing with their content - and one aspect of that is the social media engagement their content receives.

This time I looked at a different competitor, their site is targeted at their medium sized companies, if I am not mistaken.

As usual, I get a list of urls with a little script, after a bit of cleanup, the list has 5326 urls in it.
Then I run a second script to check for social shares on Google Plus, Facebook, Twitter, Stumble upon, Linkedin. Then I sort - and that's where it got really interesting, see all the duplicate numbers in FB and Linkedin? This looks very much not like an organic, natural result. (Twitter and ripples are different enough - if very low).

So, is the tool working? Let's see a sanity check, where I compare the numbers with numbers from sharetally.com. Share Tally checks a lot more (small) social media platforms, but includes all platforms my script checks, so the numbers should never be lower, and sometimes slightly higher. Works pretty well as you can see:

Is the tool stuck, cannot retrieve numbers? Sanity check no 2, this time just different urls in the same scan:

Nope, works just fine.

Manual check:
The pages do not have consistent (if any) canonical, they seem genuinely different articles from various authors, to various dates.

What, please can cause this share pattern?
These are the potential causes I can imagine (help me if you see more!)

Coincidence
Highly disciplined workforce sharing over and over + great internal process
Paid sharing
Automated sharing

What am I missing? What do you think is most likely?

Thursday, April 17, 2014

Script to check social shares: Facebook comments, shares and likes, tweets, G+ ripples, linkedin and stumbles

Social shares on Facebook, Linkedin, Twitter, Google Plus are relevant - for users, and we assume as well for rankings in search engines. (likely not directly, but indirectly through user behavior). There's a post checking a few platforms - now with more than ever! %-)

This is the version I use most - it covers the biggest relevant platforms (here in the US), and pulls it in the right speed. With my current internet connection, it just pulls slowly enough to not trigger any blocks from the api-s.

I also realized it is easiest to use a random file name, rather than pulling some info in from the site scanned. The url list can come from a sitemap scan or from a sitescan - or any other url list.

This is how the results look like:

I usually sort by each column, to filter out 'outliers', and then use the results.

This is the script:

#!bash
# set all variables to zero at the beginnning! then value >, replace
# check where the tabs come from (likely one tab to many in one of the echos?
rm "${1}-all-social-shares".csv

echo -e "Url\tripples\tFB-comments\tFB-shares\tFB-likes\ttwitter\tstumble_upon\tlinkedin" > "${1}-all-social-shares".csv
while read -r shortline ;

do
line=$(echo $shortline | sed -e 's/\?/\%3F/g' -e 's/&/\%26/g' -e 's/#/\%23/g')
echo $shortline
echo $line
number=0
re='[0-9]+'

gpull="https://plus.google.com/ripple/details?url=${line}"
ripples=$(wget -qO- "${gpull}" | grep -o "[0-9]*\s*public\s*shares.<" | sed "s/[^0-9]//g" | tr "\n" "\t" | sed 's/\thttp/\nhttp/g'| sed 's/\t//')

commentpull="https://api.facebook.com/method/fql.query?query=select%20comment_count%20from%20link_stat%20where%20url=%22${line}%22&format=json"
comment_count=`wget -qO- $commentpull | sed -e 's/^.*://g' -e 's/\}//g' -e 's/$]$//g'`

sharepull="https://api.facebook.com/method/fql.query?query=select%20share_count%20from%20link_stat%20where%20url=%22${line}%22&format=json"
share_count=`wget -qO- $sharepull | sed -e 's/^.*://g' -e 's/\}//g' -e 's/$]$//g'`

likepull="https://api.facebook.com/method/fql.query?query=select%20like_count%20from%20link_stat%20where%20url=%22${line}%22&format=json"
like_count=`wget -qO- $likepull | sed -e 's/^.*://g' -e 's/\}//g' -e 's/$]$//g'`

twitterpull="http://urls.api.twitter.com/1/urls/count.json?url=${line}&callback=twttr.receiveCount"
twitternumber=$(wget -qO- "${twitterpull}" | grep -o 'count\":[0-9]*\,' | sed -e 's/count//g' -e 's/,//g' -e 's/://g' -e 's/"//g' )

stumblepull="http://www.stumbleupon.com/services/1.01/badge.getinfo?url=${line}"
stumblenumber=$(wget -qO- "${stumblepull}" | grep -o 'views\":[0-9]*\,' | sed -e 's/views//g' -e 's/,//g' -e 's/://g' -e 's/"//g' )

linkedpull="http://www.linkedin.com/countserv/count/share?format=json&url=${line}" #echo ${linkedpull}
linkednumber=$(wget -qO- "${linkedpull}" | grep -o 'count\":[0-9]*\,' | sed -e 's/count//g' -e 's/,//g' -e 's/://g' -e 's/"//g' )

#echo -e "$line\t$value\t$comment_count\t$share_count\t$like_count" >> "$1-all-public-shares".txt
echo -e "${line}\t${ripples}\t${comment_count}\t${share_count}\t${like_count}\t${twitternumber}\t${stumblenumber}\t${linkednumber}" >> "${1}-all-social-shares".csv

done < $1

Tuesday, April 8, 2014

Scan site for list of urls - use in sitemap or for other scans

This is the sixth version (some other versions) of this scanner I use - I find it quite practical as it allows me to scan folders easily, come back with good number of urls.

It makes a header request, then stores these in a textfile. Once the scan stops, the script cleans out to only have the 200 responses in a separate file, then filters to only have the url in the final file.
I use a random number for the filename, as I use these in a special folder this allows me to not worry about incompatible characters in the filename, filtering out the basename or duplicate filenames / overwriting files in case I run a scan several times - on purpose or not. I ceep the intermediate textfiles so I can go back and check where something went wrong. Every now and then, I clean up the folder for these files.

#!bash
url="$1"
echo $url
sleep 10

name=$RANDOM

wget --spider -l 10 -r -e robots=on --max-redirect 1 -np "${1}" 2>&1 | grep -e 'http:\/\/' -e 'HTTP request sent' >> "$name"-forsitemap-raw.txt

echo $name

grep -B 1 "200 OK" "$name"-forsitemap-raw.txt > "$name"-forsitemap-200s.txt
grep -v "200 OK" "$name"-forsitemap-200s.txt > "$name"-forsitemap-urls.txt
sed -i "s/^.*http:/http:/" "$name"-forsitemap-urls.txt
sort -u -o"$name"-forsitemap-urls.txt "$name"-forsitemap-urls.txt
cat -n "$name"-forsitemap-urls.txt

Thoughts? Feedback?

Thursday, April 3, 2014

Pull data for video sitemap

Video sitemaps are sometimes helpful to make Search engines aware of videos on a site. We use several systems to generate pages with videos, and as a result it is not easy to get information from the back-end to generate sitemaps. So - like Google - we have to take it from the front end as much as possible. This script with details is likely limited to just Dell.com, and even here I have found that videos in some sections are not able to be indexed by this. Still, this has been extremely helpful to find the 'hidden' details on our video implementations. (And yes, we have requirements to change these in the process since a while :-) ).

Elements necessary for a sitemap are:

Pageurl
Title
Keywords
Description
Video URL
Thumbnail url

And this scripts pulls it nicely of many of our video pages. (We use open graph tags, which makes it relatively easy to pull most info). The script needs to be called with the filename of the text list of urls as first parameter ( . script.sh listofpages.txt)

if [[ ! $1 ]] ; then
echo "need to call with filename"
exit 1
fi
file=$RANDOM-sitemap-data.txt
echo $file
echo -e "url\tTitle\tthumbnail\tdescription" > $file
while read -r line; do
filecontent=$(wget -qO- "$line")
wait
(echo "$line" | sed 's/\r$/\t/' | tr '\n' '\t' && echo "$filecontent" | grep "og:video" | grep "swf" | sed -e "s/^.*content=\"//" -e "s/\".*$//" | sed 's/\r$/\t/' | tr '\n' '\t' && echo "$filecontent" | grep "og:title" | sed -e "s/^.*content=\"//" -e "s/\".*$//" | sed 's/\r$/\t/' | tr '\n' '\t' && echo "$filecontent" | grep "og:image" | sed -e "s/^.*content=\"//" -e "s/\".*$//" | sed 's/\r$/\t/' | tr '\n' '\t' && echo "$filecontent" | grep "og:description" | sed -e "s/^.*content=\"//" -e "s/\".*$//" ) >> $file
done < "$1"
cat -A "$file"

As always - I use this, and would love to hear tips to improve or see other scripts for site optimization and maintenance.