Comment by yabones
8 days ago
My understanding is that it just increases the "expense" of mass crawling just enough to put it out of reach. If it costs fractional pennies per page scrape with just a python or go bot, it costs nickels and dimes to run a headless chromium instance to do the same thing. The purpose is economical - make it too expensive to scrape the "open web". Whether it achieves that goal is another thing.
what do AI companies have more than everyone else? compute
anubis directly incentivises the adversary, at expense of everyone else
it's what you would deploy if you want to exclude everyone else
(conspiracy theorists note that the author worked for an AI firm)
"what do AI companies have more than everyone else? compute"
"Everyone else" actually has staggering piles of compute, utterly dwarfing the cloud, utterly dwarfing all the AI companies, dwarfing everything. It's also generally "free" on the margin. That is, if your web page takes 10 seconds to load due to an Anubis challenge, in principle you can work out what it is costing me but in practice it's below my noise floor of life expenses, pretty much rolled in to the cost of the device and my time. Whereas the AI companies will notice every increase of the Anubis challenge strength as coming straight out of their bottom line.
This is still a solid and functional approach. It was always going to be an arms race, not a magic solution, but this approach at least slants the arms race in the direction the general public can win.
(Perhaps tipping it in the direction of something CPUs can do but not GPUs would help. Something like an scrypt-based challenge instead of a SHA-256 challenge. https://en.wikipedia.org/wiki/Scrypt Or some sort of problem where you need to explore a structure in parallel but the branches have to cross-talk all the time and the RAM is comfortably more than a single GPU processing element can address. Also I think that "just check once per session" is not going to make it but there are ways you can make a user generate a couple of tokens before clicking the next link so it looks like they only have to check once per page, unless they are clicking very quickly.)
Anubis increases the minimum amount of compute required to request and crawl a page. How does that incentivize the adversary?
"Everyone else" (individually) isn't going to millions of webpages per day.
But it doesn’t cost scrapers millions of solved challenges to go to millions of webpages on a single origin. Once you solve an Anubis challenge you get a signed JWT that lets you scrape a given site an unlimited amount for a configurable amount of time. (~day). So in practice it doesn’t actually cost the scrapers a large amount amount in proportion to their usage. It actually costs them proportionally less than a normal human.
To actually make it expensive for scrapers every page would need a new challenge. And that would not be tolerated by real human users. Or the challenge solution would need to be tied to a stateful reward that only entitles a human-level amount of subsequent request usage.
2 replies →
Please don't downvote comments only because you don't like their opinion (reply to them instead). It cannot be that the same opinion is valueable when someone famous write it [1].
[1] https://news.ycombinator.com/item?id=44962529