Monday, August 26, 2013

Better: Pull urls from site for sitemap with wGet

The other bash wget script works just fine, BUT I found it had one main flaw. Every time I would run this fro another site I would either use the same filename for the file with the Urls and this way deleting the older version, or I would have to change the filename in the script. So I changed the script.
  1. Now I can call the script with the filename of the url-list as startup parameter. 
  2. It also checks if it gets that parameter, if not, mentions that. 
  3. Finally I use the input filename as part of the output filename, so no overwriting there either. 

#! bash
if [[ ! $1 ]] ;
then echo "need to call this with the file name with url list as argument"
fi 
while read -r line; do
wget --spider -b --recursive --no-verbose --no-parent -t 3 -4 –save-headers --output-file=wgetlog-$1-$i.txt $line 
done < $1
Slowly, but getting there :-)

No comments:

Post a Comment

Bookmark and Share