So, here is a small bash script to scan a given sitemap and store the results just the urls into a textfile. Input parameter is the full url of the sitemap. Let's name this sitemap-urls.sh then it would be
# bash sitemap-urls.sh http://www.dell.com/wwwsupport-us-sitemap.xml
#!bashFirst checking if the file is called with the sitemap url as parameter ($1) and if not, exit with echoing a message. If parameter is set, then download the page without saving it, grep for the right line, and use sed to replace the irrelevant parts, means html tags, with nothing to remove it.
if [[ ! $1 ]]; then echo 'call script with parameter of url for file'
exit 1
else
wget -qO- "${1}" | grep "loc" | sed -e 's/^.*loc>//g' -e 's/<\/loc>.*$//g' > sitemap-scan-output.txt
fi
Not fancy, but still good to have. I will use this to check for a few interesting things in next posts, and this is really helpful also if the urls are needed for import in analytics tools and Excel.
No comments:
Post a Comment