Comment by fluoridation

3 months ago

Hmm... What if instead of using plain SHA-256 it was a dynamically tweaked hash function that forced the client to run it in JS?

17 comments

fluoridation

jsnell 3 months ago

No, the economics will never work out for a Proof of Work-based counter-abuse challenge. CPU is just too cheap in comparison to the cost of human latency. An hour of a server CPU costs $0.01. How much is an hour of your time worth?

That's all the asymmetry you need to make it unviable. Even if the attacker is no better at solving the challenge than your browser is, there's no way to tune the monetary cost to be even in the ballpark to the cost imposed to the legitimate users. So there's no point in theorizing about an attacker solving the challenges cheaper than a real user's computer, and thus no point in trying to design a different proof of work that's more resistant to whatever trick the attackers are using to solve it for cheap. Because there's no trick.

pavon 3 months ago
But for a scraper to be effective it has to load orders of magnitude more pages than a human browses, so a fixed delay causes a human to take 1.1x as long, but it will slow down scraper by 100x. Requiring 100x more hardware to do the same job is absolutely a significant economic impediment.
- jsnell 3 months ago
  
  The entire problem is that proof of work does not increase the cost of scraping by 100x. It does not even increase it by 100%. If you run the numbers, a reasonable estimate is that it increases the cost by maybe 0.1%. It is pure snakeoil.
fluoridation 3 months ago
>An hour of a server CPU costs $0.01. How much is an hour of your time worth?
That's irrelevant. A human is not going to be solving the challenge by hand, nor is the computer of a legitimate user going to be solving the challenge continuously for one hour. The real question is, does the challenge slow down clients enough that the server does not expend outsized resources serving requests of only a few users?
>Even if the attacker is no better at solving the challenge than your browser is, there's no way to tune the monetary cost to be even in the ballpark to the cost imposed to the legitimate users.
No, I disagree. If the challenge takes, say, 250 ms on the absolute best hardware, and serving a request takes 25 ms, a normal user won't even see a difference, while a scraper will see a tenfold slowdown while scraping that website.
- michaelt 3 months ago
  
  The problem with proof-of-work is many legitimate users are on battery-powered, 5-year-old smartphones. While the scraping servers are huge, 96-core, quadruple-power-supply beasts.
- jsnell 3 months ago
  
  The human needs to wait for their computer to solve the challenge.
  You are trading something dirt-cheap (CPU time) for something incredibly expensive (human latency).
  Case in point:
  > If the challenge takes, say, 250 ms on the absolute best hardware, and serving a request takes 25 ms, a normal user won't even see a difference, while a scraper will see a tenfold slowdown while scraping that website.
  No. A human sees a 10x slowdown. A human on a low end phone sees a 50x slowdown.
  And the scraper paid one 1/1000000th of a dollar. (The scraper does not care about latency.)
  That is not an effective deterrent. And there is no difficulty factor for the challenge that will work. Either you are adding too much latency to real users, or passing the challenge is too cheap to deter scrapers.
  
  6 replies →

VMG 3 months ago

crawlers can run JS, and also invest into running the Proof-Of-JS better than you can

tjhorner 3 months ago
Anubis doesn't target crawlers which run JS (or those which use a headless browser, etc.) It's meant to block the low-effort crawlers that tend to make up large swaths of spam traffic. One can argue about the efficacy of this approach, but those higher-effort crawlers are out of scope for the project.
- scratchyone 3 months ago
  
  wait but then why bother with this PoW system at all? if they're just trying to block anyone without JS that's way easier and doesn't require slowing things down for end users on old devices.
- Imustaskforhelp 3 months ago
  
  reminds of how wikipedia literally has all the data available even in a nice format just for scrapers (I think) and even THEN, there are some scrapers which still scraped wikipedia and actually made wikipedia lose some money so much that I am pretty sure that some official statement had to be made or they disclosed about it without official statement.
  Even then, man I feel like you yourself can save on so many resources (both yours) and (wikipedia) if scrapers had the sense to not scrape wikipedia and instead follow wikipedia's rules
fluoridation 3 months ago

If we're presupposing an adversary with infinite money then there's no solution. One may as well just take the site offline. The point is to spend effort in such a way that the adversary has to spend much more effort, hopefully so much it's impractical.