andreas.wpv: December 2013

Tuesday, December 31, 2013

Reduce pictures with a script for imagemagick

Blogging is fun, but can be quite some effort. One of the things necessary is to scale pictures so they fit into the blog, are large and sharp enough to show the details necessary, but also be as small as possible to have great page load times.

The best results I can possibly generate are with photoshop, and that also has a nice batch option. On Windows Irfanview is a great tool to automate this super easily in pretty good quality as well. My tool of choice on linux is imagemagick. While it has tons of options, below setting works great for me.

This script I start from the folder with the pictures. It takes one parameter on call - the length wanted for the longer side. So, calling it like 'image-resize.sh 800' is the way to go.
It would check if the folder exists and if not generate it; then rename all filenames in the startfolder to small letters, then rename .jpeg to .jpg to make all jpg accessible to the imagemagic script.

#!bash
if [[ ! -d $1 ]]
then mkdir "$1"
fi

rename 'y/A-Z/a-z/' *
rename 's/\.jpeg/\.jpg/' *

for i in *.jpg
convert "$i" -resize "${1}^>" -quality 25% -unsharp 1.2x1.2+1+0 "$1"/s_"$i"
done

Then it reduce pictures where the larger side (height or width) is larger than 800 px to exactly 800 px. It maintains the ratio, sharpens as well and reduces the quality to 30% as well - a value I found the sweet spot between quality and image size for many of my pictures. Final step is to add a s_ to the filename and generate it into that folder. Most important insight (from a forum) was the setting for 'value^>' - setting the longer side to this value.

Monday, December 23, 2013

Download files on one page with wget - define the type

Something I rarely do - but now I had to: Download a bunch of .ogg files. I am exploring some sound capabilities of my system, and found the system sounds in /usr/share/sounds/.

Naja, not very special - so I started searching in google for 'free sound download filetype;ogg' and similar, and found a few nice sites like www.mediacollege.com.

So, I downloaded 2-3 wavs, and oh, my, that takes time. So, here's my little script:

#!bash
wget -r -l2 --user-agent Mozilla -nd -A.wav "http://www.mediacollege.com/downloads/sound-effects/people/laugh/" -P download
for file in download/*.wav ; do echo "$file" && paplay "$file"; done

Using wget with r for recursive, -nd -P to not rebuild the directory structure (thisis 4 levels down) and then -P download to download into the subfolder download. -A.wav,.ogg only downloads wav and ogg files, and -l1 (one level recursive) for just this page and the files linked from it. Changing this can lead to huge download times and sizes, so careful.

Once done, the last line just echos each filename and then plays it. (paplay for my system, if that does not result in anything perhaps 'aplay' works).

Tuesday, December 17, 2013

Keyword - check: ranking 100 urls in search engine

Do you ever need to check for a keyword or two which pages rank on Google, and find the results hard to read, and especially cumbersome to copy for further use? At Dell we use large scale tools like seoclarity , and get tons of data in high quality, and I still sometimes have these one-off requests, where I need a small tool, NOW.

It is a small script for linux bash (and thus should run with slight modifications also on cygwin on windows computers).

First I call the script with two parameters - the search engine, then the search term, as in
# . script.sh www.searchengine.com "searchterm" . Search terms with spaces work, just replace the space with a '+'.
That's used to build the url, which then is used with curl to pull the results from Google. Xidel is a small bash program with super-easy use to use xpath to filter content.

# $1 is query url, $2 is the search term, skipping the check if both are given for shortness

url="http://${1}/search?q=${2}&sourceid=chrome&ie=UTF-8&pws=0&gl=us&num=100"

curl -s -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" -o temp.file "$url"

xidel temp.file -e "//cite" > urls-from-curlgoogle.csv

rm temp.file

Thanks to Benito for Xidel and for helpful input to fix my variable assignment thanks to +Norbert Varzariu , +Alexander Skwar.

Again, this is not to replace any of the excellent tools out there, free or paid, but to accommodate small tasks, 'quick and dirty' and with some accuracy but not too much handling. And for sure please handle this responsibly, not spamming any search engine and not disregarding any terms of use. I checked when I tried this, and the current ToS seem not to prevent this - but I am no lawyer and might be mistaken. At your own risk.

Thursday, December 12, 2013

"Obamacare": Public shares of Healthcare.gov on Google Plus

Shares to the public on Google plus are named 'ripples', and they show how often and which ways a post has been shared.

A while ago I made a little script to get the number of ripples for a list of urls, and showed how to use it with Alexa top results (here).
Now, while that was entertaining, I wondered if I can learn something, and the Obama administration has a bit of a reputation to be tech-savvy.

I checked www.whitehouse.gov - no sitemap. So, next 'popular topic' that came to mind was 'Obamacare', so I pulled the sitemap from www.healthcare.gov (yes, they have one). And then I ran the little script.

At time of testing (Dec 11, 2013) healthcare.gov had 612 urls in the sitemap. Only 4 of these pages have been shared publicly on G+, with a pretty good number on the homepage.

While 800 plus is far from www.google.com and www.facebook.com with 7000+ shares, Texas.gov has only 5 public shares from what I can see, and www.whitehouse.gov has only 138.

PLEASE: focus on comments around SEO, not politics or healthcare. While these topics are more important overall than any seo or Google plus, for this post they are off topic.

Tuesday, December 10, 2013

Bing Webmaster Tools - check for old verification files

Sometimes it is necessary to check if domains have the right verification files in the root folder: for an agency for a multitude of sites, for a company for several domains and subdomains.

People move on, and the webmaster or seo agency from 3 years ago is not necessarily a partner any more. And there still might be old verification files in the root folder, allowing access to data in Bing Webmaster tools (BWT) for people who should not have access to this any more.

And this is not just about getting access to perhaps very deep knowledge, but also about the possibility of changing settings and causing immediate monetary damage. One possible example would be a false redirect or sitelinks or target country for a site - major damage possible!

So, got to check only the right meta tags and verification files. These IDs are very long, and comparing by hand is prone to mistakes and might take quite a while. Copy paste into a spreadsheet works fine, but depending on the number of domains to check can take a while.

Here I just show how to check for verification files. They are called BingSiteAuth.xml and mainly contain of an ID with little code around it.

Download the current file from BWT, and check this id.
Next step, prepare a url list in a text file, for all domains / subdomains you want to check.
Run this little authentication file checker script:

#bash
if [[ ! $1 ]] ; then
echo -e '\n\tplease call this file with a list of urls\n'
exit 1
else

while read -r line; do
url=(${line}/BingSiteAuth.xml)

site_id=$(wget -S -qO- "$url" 2> grep "" | tr "\n" "\t" | sed -e "s/^.*//g" -e "s/<\/user.*$//g" )
# set default value for site_id, then check if gathered value matches the given value (as mentioned in intro text

if [[ $site_id = "" ]] ; then site_id="0" ; fi
if [ ${site_id} == "78F80DA184A74A413....and-so-on...45671" ] ;
then match="match"
else match="\E[34;47m no match"
fi

# get all in one line, then reset formatting

echo -e ${line} "\t\tsite-id" ${site_id} ${match} `tput sgr0`

# alternate ending into file for further use BingSiteAuth-matches.txt
#if [ ${site_id} == "78F80DA184A74A4137F56098D9D45671" ] ;
# then match="match"
# else match="no match"
#fi
#echo -e ${line} "\t\tsite-id" ${site_id} ${match} >> BingSite-check.txt

done < $1
fi

script for download as shown above at your own risk.