Comment by matheusmoreira
3 years ago
Why is it relevant whether the traffic is human or automated? The whole point of the internet is you can put a server out there and anyone anywhere can connect to it with any HTTP client.
To me it seems like the only people who care about that are those who want to sell our attention to the highest bidder via advertising. Wouldn't you be having the same difficulties if there were just as much traffic coming from humans?
I want to provide as many human beings as possible with value by distributing my processing power fairly between them. If I get DDoS:ed by a botnet, I won't provide anyone with anything other than optimistically an error page.
If I had infinite money and computing resources, this would be fine, but I'm just one guy with a not very power computer hosted on domestic broadband, and even though I give away compute freely, it just takes one bag of dicks with a botnet to use it all up for themselves, and without bot mitigation, I'm helpless to prevent it.
Oh and I actually do provide an API for free machine access, so it's not like they have to use headless browsers and go through the front door like this. But they still do.
Serves me right for trying to provide a useful service I guess?
Arguably, the problem here is that you want to do it free of charge. That's the problem in general: adtech aside, people want to discriminate between "humans" and "bots" in order to fairly distribute resources. What should be happening though, is that every user - human and bot alike - cover their resource usage on the margin.
Tangent: there's a reason the browser is/used to be called an user agent. The web was meant to be accessed by automation. When I use a script to browse the web for me with curl, that script/curl is as much my agent as the browser is.
I see how remote attestation and other bot detection/prevention techniques make it cheaper for you to run the service the way you do. But the flip side is, those techniques will get everyone stuck using shitty, anti-ergonomic browsers and apps, whose entire UX is designed to best monetize the average person at every opportunity. In this reality, it wouldn't be possible to even start a service like yours without joining some BigCo that can handle the contractual load of interacting with every other business entity...
(Also need I remind everyone, that while the first customers of remote attestation are the DRM-ed media vendors, the second customer is your bank, and all the other banks.)
> The web was meant to be accessed by automation.
Completely agree.
I see. I respect that.
The bot detection won't come without cost. It will centralize power in the hands of Cloudflare and other giants. I think it's only a matter of time until they start exercising their powers. Is this really an acceptable tradeoff?
If we do accept it, I think the day will come when Cloudflare starts rejecting non-Chrome browsers, to say nothing of non-browser user agents.
I don't see any good options at this point. The situation profoundly sucks for everyone involved. We're stuck between the almost absurdly adversarial open web, or bargaining with the devil at Cloudflare, and now Google's remote attestation which is basically Google taking a stab at the problem.
To be clear I don't think remote attestation is a good solution, but it's at least a solution. Any credible argument against Cloudflare or remote attestation needs to address state of the open web and have some sort of plan how to fix it. Or at least acknowledge that's what Google and CF are trying to solve. Dismissing the problem as a bunch of mindless corporate greed just doesn't fly. It affects anyone trying to host anything on the Internet, and is only getting worse. The status quo and where it's heading is completely untenable.
It's easy to say well just host static content, but that's ceding all of Internet discovery and navigation and discussion and interactivity to big tech, irreversibly pulling up the ladder on any sort of free and independent competition in these areas. That's, in my opinion, a far greater problem.
4 replies →
[dead]
Is this a purposeful DDoS or just bots trying to scrape results? If this is a DDoS on purpose, what's their financial gain? Did they demand payment?
If you're talking about bots scraping content, then the question is also why. Perhaps by letting them do so, you indirectly provide even more human beings?
It's entirely possible that these questions are absurd, however, since scraping using headless browsers is not free, then there must be some reason for scraping a given service... and it's usually something that in the end benefits more human beings.
Best guess is it's some attempt at blackhat SEO, to manipulate the query logs and typeahead suggestions (I don't have query logs but whatever, maybe they think I secretly forward queries to Google or something).
But really, fuck if I know. I've received no communication so I can only guess as what they're trying to do. I have a free public API they're more than welcome to use if they want to like actually use the search engine, but they still try to go through a botnet through the public web endpoint.
I've talked to a bunch of people operating other search engines and all of them are subject to this type of 24/7 DDOS. It's been going for nearly two years now.
1 reply →
>Why is it relevant whether the traffic is human or automated?
because all traffic costs for the service provider, but the automated traffic can be run at thousands of users cheaper than it is to run one human user (who after all is bounded by time and cost of computation and bandwidth) whereas the automated is not bound by time, giving them the opportunity to DOS you - either on purpose or just accidentally.
But the solution for this is a rate limit, not captcha. The real reason they care about "human traffic" is because bots don't buy stuff.
Rate limits do all of bupkis against a botnet. It's not possible to assume that each one IP or connection is one person. The crux that all of these initiatives like remote attestation are trying to solve is that as it is, one person may command tens of thousands of connections, and from a server-standpoint, there's really not much you can do to allocate resources fairly.
you're the first person to say anything about Captcha? The guy who started this argument needing some way to sort out human traffic operates a free service and is complaining the bot traffic makes it hard to offer a free service since bots cost money.
2 replies →