Comment by mindslight
3 years ago
> crawls every single webpage (and we have 10's of thousands) every couple days
100,000 / (86400 * 2) = 0.58 req/sec.
I acknowledge that those requests are likely bursty, but you were complaining as if the total amount was the problem. If the instantaneous request rate is the actual problem, you should be able to throttle on that, no?
I can totally believe your site has a bunch of accidental complexity that is harder to fix than just pragmatically hassling users. But it'd be better for everyone if this were acknowledged explicitly rather than talked about as an inescapable facet of the web.
> But if the instantaneous request rate is the problem, you should be able to filter on that, no?
Again, not every website is the same, and not every website has a huge team behind it to deal with this stuff. Spending 30-something developer hours implementing custom rate limiting and throttling, distributed caching, gateways, etc is absurd for probably 99% of websites.
You can pay Cloudflare $0.00 and get good enough protection without spending a second thinking about the problem. That is why you see it commonly...
If your website does not directly generate money for you or your business, then sinking a bunch of resources into it is silly. You will likely never experience this sort of challenge on an ecommerce site, for instance... but a blog or forum? Absolutely.
Actually I get hassled all the time on various ecommerce sites. Because once centralizing entities make an easy to check "even moar security" box, people tend to check it lest they get blamed for not doing so. And then it gets stuck on since the legitimate users that closed the page out of frustration surely get counted in the "attackers protected against" metric!
I'd say you're really discounting the amount of hassle people get from these challenges. Some sites hassle users every visit. Some hassle users every few pages. Some hassle logged in users. Some just go into loops (as in OP). Some don't even pop up a challenge and straight up deny based on IP address!
And since we're talking about abstract design, why can't Cloudflare et al change their implementations to throttle based on individual IPs, rather than blanket discriminating against more secure users? Maybe you personally have taken the best option available to you. But that doesn't imply the larger dynamic is justifiable.
> why can't Cloudflare et al change their implementations to throttle based on individual IPs, rather than blanket discriminating against more secure users
Cloudflare does not do this - I've made that point several times. The website operator either has the security setting cranked to a paranoid level (which is not the default, btw), or they are experiencing an attack. Those are the only two scenarios where Cloudflare is going to inject a challenge as frequently as you assert.
Normally Cloudflare will only challenge after unusual behavior has been detected, such as inhuman numbers of page requests within a short duration, or the URL/forms are being manipulated, etc. The default settings are fairly unobtrusive in my experience.
If you are also complaining about generic captchas on forms and what-not, that's a different thing entirely. Those exists as anti-bot measures, naturally, but also as anti-human measures. We simply do not want a pissed customer to send us 900 contact-us form requests one drunken evening...
2 replies →
They may recrawl every 2 days and make 100,000 parallel requests.