It is a small script for linux bash (and thus should run with slight modifications also on cygwin on windows computers).
First I call the script with two parameters - the search engine, then the search term, as in
# . script.sh www.searchengine.com "searchterm" . Search terms with spaces work, just replace the space with a '+'.
That's used to build the url, which then is used with curl to pull the results from Google. Xidel is a small bash program with super-easy use to use xpath to filter content.
# $1 is query url, $2 is the search term, skipping the check if both are given for shortness
url="http://${1}/search?q=${2}&sourceid=chrome&ie=UTF-8&pws=0&gl=us&num=100"
curl -s -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" -o temp.file "$url"
xidel temp.file -e "//cite" > urls-from-curlgoogle.csv
rm temp.file
Thanks to Benito for Xidel and for helpful input to fix my variable assignment thanks to +Norbert Varzariu , +Alexander Skwar.
Again, this is not to replace any of the excellent tools out there, free or paid, but to accommodate small tasks, 'quick and dirty' and with some accuracy but not too much handling. And for sure please handle this responsibly, not spamming any search engine and not disregarding any terms of use. I checked when I tried this, and the current ToS seem not to prevent this - but I am no lawyer and might be mistaken. At your own risk.
No comments:
Post a Comment