Comment by kmoser
19 days ago
My cheap and dirty way of dealing with bots like that is to block any IP address that accesses any URLs in robots.txt. It's not a perfect strategy but it gives me pretty good results given the simplicity to implement.
19 days ago
My cheap and dirty way of dealing with bots like that is to block any IP address that accesses any URLs in robots.txt. It's not a perfect strategy but it gives me pretty good results given the simplicity to implement.
I don't understand this. You don't have routes your users might need in robots.txt? This article is about bots accessing resources that other might use.
It seems better to put fake honeypot urls in robots.txt, and block any up that accesses those.
Blocking will never work.
You need to impose cost. Set up QoS buckets, slow suspect connections down dramatically (almost to the point of timeout).
Ah I see
How can I implement this?
Too many ways to list here, and implementation details will depend on your hosting environment and other requirements. But my quick-and-dirty trick involves a single URL which, when visited, runs a script which appends "deny from foo" (where foo is the naughty IP address) to my .htaccess file. The URL in question is not publicly listed, so nobody will accidentally stumble upon it and accidentally ban themselves. It's also specifically disallowed in robots.txt, so in theory it will only be visited by bad bots.
Another related idea: use fail2ban to monitor the server access logs. There is one filter that will ban hosts that request non-existent URLs like WordPress login and other PHP files. If your server is not hosting PHP at all it's an obvious sign that the requests are from bots that are probing maliciously.