← Back to context

Comment by JSeiko

10 hours ago

we are having occasional lows in page speed performance due to LARGE amounts of bot traffic. full disclosure - we've not really been able to resolve this fully/well. Let us know if you have a good idea for how to deal with it

If it's purely bot traffic, then Anubis could help

You could have seen it on some websites already

https://anubis.techaro.lol/

  • anubis only works against lazy scrapers, and at a cost to your users. I'd prefer people not use it.

    Bot traffic comes from machines that usually have a lot of idle cpu (since they're largely blocked on network IO as they scrape a bunch of sites in parallel), so they can trivially solve the anubis "proof of work" challenge, save the cookie, and then not solve it again for that site.

    The only reason scrapers don't solve it is if the developers were too lazy to implement it... and modern scrapers also do, codeberg stopped using anubis because modern scrapers were updated to solve it.

    The "proof of work" has to be easy or else people on old cell phones couldn't access your site (since an old android phone would start to overheat and throttle trying to solve a challenge that would take a modern server even several seconds), and it also consumes your cell-phone user's batteries, which is a really precious resource for them compared to the idle cpu on a server.

I'm only a small-scale sysadmin but the way that I understand the internet is that you send abuse notifications to the IP address block owner and, if it doesn't get resolved, you block. The whois/rdap database reveals which IPs all belong to the same hosting provider or ISP, so you can summarize that all to one list of IP addrs + timestamps per some time period

The ISP actually knows which subscriber is on that line, can send them notices, block them, terminate them... loads of things that you simply cannot do because you have no relation to this person. And frankly I wouldn't want to need to have a personal relation with every website that I visit; my ISP can reach me if there is anything relevant to continued use of the internet. From personal experience, when I was a teenager, the ISP cutting our household off after an abuse report was an effective way of stopping what I was doing

  • It’s effective against teenagers maybe. Not so much against Amazon, Meta or wherever botnet/crawler is coming out of China these days from up-and-coming AI companies.

Do you host a torrent?

I have about 50k of the books, I would have used a torrent of just the txt files if it was prominent.