Comment by mindslight

3 years ago

I do know the common narrative. FUD -> more snake oil "solutions". I myself rely on a special type of igneous rock that keeps hackers away. In reality:

1. Most sites only have this problem due to inefficient design. You are literally complaining about handling 1 request every 2 seconds! That's like a "C10μ problem."

2. How many IPs are these bots coming from? Rate limiting per source IP wouldn't be nearly as intrusive.

3. There are much less obtrusive ways of imposing resource requirements on a requester, like say a computational challenge.

Not every website is the same, folks.

> You are literally complaining about handling 1 request every 2 seconds

I don't know where this came from. The inconsiderate bots tend to flood your server, likely someone doing some sort of naïve parallel crawl. Not every website has a full-stack in-house team behind it to implement custom server-side throttles and what-not either.

However, like I mentioned already, if every single visitor is getting the challenge, then either the site is experiencing an attack right now, or the operator has the security settings set too high. Some commonly-targeted websites seem to keep security settings high even when not actively experiencing an attack. To those operators, remaining online is more important than slightly annoying some small subset of visitors 1 time.

  • > crawls every single webpage (and we have 10's of thousands) every couple days

    100,000 / (86400 * 2) = 0.58 req/sec.

    I acknowledge that those requests are likely bursty, but you were complaining as if the total amount was the problem. If the instantaneous request rate is the actual problem, you should be able to throttle on that, no?

    I can totally believe your site has a bunch of accidental complexity that is harder to fix than just pragmatically hassling users. But it'd be better for everyone if this were acknowledged explicitly rather than talked about as an inescapable facet of the web.

    • > But if the instantaneous request rate is the problem, you should be able to filter on that, no?

      Again, not every website is the same, and not every website has a huge team behind it to deal with this stuff. Spending 30-something developer hours implementing custom rate limiting and throttling, distributed caching, gateways, etc is absurd for probably 99% of websites.

      You can pay Cloudflare $0.00 and get good enough protection without spending a second thinking about the problem. That is why you see it commonly...

      If your website does not directly generate money for you or your business, then sinking a bunch of resources into it is silly. You will likely never experience this sort of challenge on an ecommerce site, for instance... but a blog or forum? Absolutely.

      4 replies →

  • In my experience, if bots start flooding a server, it's the ISP/hosting provider that gets angry and contacts the owner first. )