← Back to context

Comment by jsheard

1 day ago

It could, the idea is just to tip the economics such that it's not worth it for the bot operator. That kind of abuse typically happens at a vast scale where the cost of solving the challenges adds up fast.

It's not exactly true. You don't need to solve the challenge per each request as PoW systems provide you with a session token which is valid for a while.

Basically you need session-token generators which usually are automated headless browsers.

Another not-exactly-valid point is you don't need a botnet. You can scrape at scale with 1 machine using proxies. Proxies are dirt cheap.

So basically you generate a session for a proxy IP and scrape as long as the token is valid. No botnets, no magic, nada. Just business.

Botnets don't even use their own hardware.

Why would someone renting dirt cheap botnet time care if the requests take a few seconds longer to your site?

Plus, the requests are still getting through after waiting a few seconds, so it does nothing for the website operator and just burns battery for legit users.

  • If you're a botnet operator of a botnet that normally scraped a few dozen pages per second and then noticed a site suddenly taking multiple seconds per page, that's at least an order of magnitude (or two) decrease in performance. If you care at all about your efficiency, you step in and put that site on your blacklist.

    Even if the bot owner doesn't watch (or care) about about their crawling metrics, at least the botnot is not DDoSing the site in the meantime.

    This is essentially a client-side tarpit, which are actually pretty effective against all forms of bot traffic while not impacting legitimate users very much if at all.

    • A tarpit is selective. You throw bad clients in the tarpit.

      This is something you throw everyone through. both your abusive clients (running on stolen or datacenter hardware) and your real clients (running on battery-powered laptops and phones). More like a tar-checkpoint.

  • There is still an opportunity cost. They can scrape just your site or they can scrape 100 other sites without POW (no idea if it is 10, 100 etc)

    • Websites aren't really fungible like that, and where they are (like general search indexing for example), that's usually the least hostile sort of automated traffic. But if that's all you care about, I'll cede the point.

      Usually if you're going to go through the trouble of integrating a captcha, you want to protect against targeted attacks like a forum spammer where you don't want to let the abusive requests through at all, not just let it through after 5000ms.

  • Botnets just shift the bottleneck from "how much compute can they afford to buy legit" to "how many machines can they compromise or afford to buy on the black market". Either way it's a finite resource, so making each abusive request >10,000x more expensive still severely limits how much damage they can do, especially when a lot of botnet nodes are IoT junk with barely any CPU power to speak of.

That's definitely the idea.

So the crazy decentralized mystery botnet(s) that are affecting many of us -- don't seem to be that worried about cost. They are making millions of duplicate requests for duplicate useless content, it's pretty wild.

On the other hand, they ALSO dont' seem to be running user-agents that execute javascript.

This is in the findings of a group of some of my colleagues at peer non-profits that have been sharing notes to try to understand what's going on.

So the fact that they don't run JS at present means that PoW would stop them -- but so would something much simpler and cheaper relying on JS.

If this becomes popular, could they afford to run JS and to calcualte the PoW?

It's really unclear. The behavior of these things does not make sense to me enough to have much of a theory about what their cost/benefits or budgets are, it's all a mystery to me.

Definitely hoping someone manages to figure out who's really behind this and why at some point. (i am definitely not assuming it's a single entity either).