Comment by fluoridation

3 days ago

Hmm... What if instead of using plain SHA-256 it was a dynamically tweaked hash function that forced the client to run it in JS?

No, the economics will never work out for a Proof of Work-based counter-abuse challenge. CPU is just too cheap in comparison to the cost of human latency. An hour of a server CPU costs $0.01. How much is an hour of your time worth?

That's all the asymmetry you need to make it unviable. Even if the attacker is no better at solving the challenge than your browser is, there's no way to tune the monetary cost to be even in the ballpark to the cost imposed to the legitimate users. So there's no point in theorizing about an attacker solving the challenges cheaper than a real user's computer, and thus no point in trying to design a different proof of work that's more resistant to whatever trick the attackers are using to solve it for cheap. Because there's no trick.

  • But for a scraper to be effective it has to load orders of magnitude more pages than a human browses, so a fixed delay causes a human to take 1.1x as long, but it will slow down scraper by 100x. Requiring 100x more hardware to do the same job is absolutely a significant economic impediment.

    • The entire problem is that proof of work does not increase the cost of scraping by 100x. It does not even increase it by 100%. If you run the numbers, a reasonable estimate is that it increases the cost by maybe 0.1%. It is pure snakeoil.

  • >An hour of a server CPU costs $0.01. How much is an hour of your time worth?

    That's irrelevant. A human is not going to be solving the challenge by hand, nor is the computer of a legitimate user going to be solving the challenge continuously for one hour. The real question is, does the challenge slow down clients enough that the server does not expend outsized resources serving requests of only a few users?

    >Even if the attacker is no better at solving the challenge than your browser is, there's no way to tune the monetary cost to be even in the ballpark to the cost imposed to the legitimate users.

    No, I disagree. If the challenge takes, say, 250 ms on the absolute best hardware, and serving a request takes 25 ms, a normal user won't even see a difference, while a scraper will see a tenfold slowdown while scraping that website.

    • The problem with proof-of-work is many legitimate users are on battery-powered, 5-year-old smartphones. While the scraping servers are huge, 96-core, quadruple-power-supply beasts.

    • The human needs to wait for their computer to solve the challenge.

      You are trading something dirt-cheap (CPU time) for something incredibly expensive (human latency).

      Case in point:

      > If the challenge takes, say, 250 ms on the absolute best hardware, and serving a request takes 25 ms, a normal user won't even see a difference, while a scraper will see a tenfold slowdown while scraping that website.

      No. A human sees a 10x slowdown. A human on a low end phone sees a 50x slowdown.

      And the scraper paid one 1/1000000th of a dollar. (The scraper does not care about latency.)

      That is not an effective deterrent. And there is no difficulty factor for the challenge that will work. Either you are adding too much latency to real users, or passing the challenge is too cheap to deter scrapers.

      6 replies →

crawlers can run JS, and also invest into running the Proof-Of-JS better than you can

  • Anubis doesn't target crawlers which run JS (or those which use a headless browser, etc.) It's meant to block the low-effort crawlers that tend to make up large swaths of spam traffic. One can argue about the efficacy of this approach, but those higher-effort crawlers are out of scope for the project.

    • wait but then why bother with this PoW system at all? if they're just trying to block anyone without JS that's way easier and doesn't require slowing things down for end users on old devices.

    • reminds of how wikipedia literally has all the data available even in a nice format just for scrapers (I think) and even THEN, there are some scrapers which still scraped wikipedia and actually made wikipedia lose some money so much that I am pretty sure that some official statement had to be made or they disclosed about it without official statement.

      Even then, man I feel like you yourself can save on so many resources (both yours) and (wikipedia) if scrapers had the sense to not scrape wikipedia and instead follow wikipedia's rules

  • If we're presupposing an adversary with infinite money then there's no solution. One may as well just take the site offline. The point is to spend effort in such a way that the adversary has to spend much more effort, hopefully so much it's impractical.