Monday, August 26, 2013

Better: Pull urls from site for sitemap with wGet

The other bash wget script works just fine, BUT I found it had one main flaw. Every time I would run this fro another site I would either use the same filename for the file with the Urls and this way deleting the older version, or I would have to change the filename in the script. So I changed the script.
  1. Now I can call the script with the filename of the url-list as startup parameter. 
  2. It also checks if it gets that parameter, if not, mentions that. 
  3. Finally I use the input filename as part of the output filename, so no overwriting there either. 

#! bash
if [[ ! $1 ]] ;
then echo "need to call this with the file name with url list as argument"
while read -r line; do
wget --spider -b --recursive --no-verbose --no-parent -t 3 -4 –save-headers --output-file=wgetlog-$1-$i.txt $line 
done < $1
Slowly, but getting there :-)

Monday, August 12, 2013

Clean up WGET results for sitemap

After running the script to get urls per wget, now got to clean them out to get just plain urls. The urls need to be on the right domain / folder and need to have had a 200 OK http respone. So, now there are a bunch of text files with urls in them, but not just urls, but a lot more:

2013-07-16 21:39:00 URL: [21149] -> "" [1]
2013-07-16 21:39:00 URL: [112] -> "" [1]
2013-07-16 21:39:01 URL: 200 OK

Now, that's all good to know, but not really usable for building a sitemap. Being used to work and fix a lot of things in excel, that was my first try - but I quickly quit. The only way to come close to cleaning up was changing table to text, and even that did not clean up everything - and took too long, too.

#! bash 
#loop through the files
for i in {1..30}; do
#pull only the lines with a URL, then all that do NOT match my domain 
grep 'URL' wgetlog$i.txt | grep -i '' > wget$i.txt
# delete beginning date including url then delete ending part from space on, including file size numbers, then remove spaces, then remove lines not #matching
sed -e 's/^[0-9 -:]*URL://g' -e 's/\ \[.*$//g' -e 's/\ //g' -e '/dell\.com/!d' wget$i.txt > support-url-$i.txt
Not that pretty, but works fine. I just need to adjust the file name to what I use in the wget script, plus set the number to the number of files used, and then it works just fine. 

This is great for any site using just a couple of thousand pages and a few sitemap updates per year.

Thursday, August 1, 2013

How to optimize your Google Plus post for high CTR in notifications

Google plus brought in top designers to get the tech company to appealing design, and now see one example of what's come out of it: Notifications. 

Here is a picture of some random notifications I got when writing this post, lets take a closer look.

G+ notifications
What does each tile show me? 

1. Face or logo
That's great, if I know the person or company well enough. Favoring big brands - personal and companies.

2. Name and 'Shared a post' 
Shared a post once on the flap might be good, but on each tile?

3. Copy
Looks like it is showing the first ~ 45 to 50 characters of a post or if no copy is in the post it shows where it is coming from.

This being the notifications I see to alert me of new posts, the picture / logo can be used branding, and the first 50 characters can be used to convert to a click through. 
This seems to be fairly similar to the browser title for pages - 'frontload' your post with a relevant keyword and call to action, to capture attention and get a click through. 

Apart from this optimization aspect, I hope that Google will integrate more info into each box - for example swapping for the superfluous 'shared a link'. Would be great to see any of these or a combination:

  • Topic category
  • one of the new wonderful auto-hashtags
  • a bit more copy 
  • the number of +1 or comments on the post.
Each of these would help me decide which of the hundreds of post are worth checking out. Now that would be great design to have it in there and not look cramped. 

Bookmark and Share