Attack of the bots

One out of every 5 hits to my site is from a bot. I spent a fair bit of time this weekend analysing my log file for last month (which runs to gigabytes, and I ended up learning a few things about file system optimisation, but more on that later). 80% of the hits were from regular browsers. 20% were from robots. Here's a sample of the user-agents: Mozilla/5.0 (compatible; Yahoo! Slurp; <a href="http://help.yahoo.com/help/us/ysearch/slurp)">http://help.yahoo.com/help/us/ysearch/slurp)</a> Mozilla/5.0 (compatible; Googlebot/2.1; +<a href="http://www.google.com/bot.html)">http://www.google.com/bot.html)</a> Mediapartners-Google DotBot/1.0.1 (<a href="http://www.dotnetdotcom.org/#info">http://www.dotnetdotcom.org/#info</a>, [email protected]) Mozilla/5.0 (Twiceler-0.9 <a href="http://www.cuill.com/twiceler/robot.html)">http://www.cuill.com/twiceler/robot.html)</a> msnbot/1.1 (+<a href="http://search.msn.com/msnbot.htm)">http://search.msn.com/msnbot.htm)</a> FeedBurner/1.0 (<a href="http://www.FeedBurner.com)">http://www.FeedBurner.com)</a> Mozilla/5.0 (compatible; attributor/1.13.2 +<a href="http://www.attributor.com)">http://www.attributor.com)</a> WebAlta Crawler/2.0 (<a href="http://www.webalta.net/ru/about_webmaster.html)">http://www.webalta.net/ru/about_webmaster.html)</a> (Windows; U; Windows NT 5.1; ru-RU) Yandex/1.01.001 (compatible; Win16; I) ... You get the idea. The bulk of these are search engines. Over two-thirds of the bot requests were from Yahoo Slurp. Now, this struck me as weird. If I take the top 3 search engines that are sending traffic my way, ...