Which bots crawling your site?
Which bots are visiting your site? Googlebot, bingbot and yandex? You might be surprised by the number and variety.Script to identify crawlers
Just filtering the logfile we get from IIS with grep -i 'bot', and then writing the agent - in this logfile in position 13 - into a separate file, and then just sort, count occurrence of each.grep -i 'bot' logfile | awk 'BEGIN { FS = " " } { print $ 13 } ' >> bots-names.txt
sort bots-names.txt | uniq -c | sort -k 1nr > bots-which-counter.txt
rm bots-names.txt
This gives me a nice list of bots, and how many requests they sent in the time of the logfile. Interesting list, lots from bots I would not have expected, like mail.RU and 'linux'.
Another post I share a table how often bots come over time - and I pick the most relevant bots with this above list (plus on what brings us traffic).
Top bots and crawlers visiting certain parts of www.dell.com:
I cut off the numbers (count, first column) and this is just sorted the top few visiting crawlers / bots.
No comments:
Post a Comment