Comment by ksymph

3 months ago

Reading the original release post for Anubis [0], it seems like it operates mainly on the assumption that AI scrapers have limited support for JS, particularly modern features. At its core it's security through obscurity; I suspect that as usage of Anubis grows, more scrapers will deliberately implement the features needed to bypass it.

That doesn't necessarily mean it's useless, but it also isn't really meant to block scrapers in the way TFA expects it to.

[0] https://xeiaso.net/blog/2025/anubis/

7 comments

ksymph

jhanschoo 3 months ago

Your link explicitly says:

> It's a reverse proxy that requires browsers and bots to solve a proof-of-work challenge before they can access your site, just like Hashcash.

It's meant to rate-limit accesses by requiring client-side compute light enough for legitimate human users and responsible crawlers in order to access but taxing enough to cost indiscriminate crawlers that request host resources excessively.

It indeed mentions that lighter crawlers do not implement the right functionality in order to execute the JS, but that's not the main reason why it is thought to be sensible. It's a challenge saying that you need to want the content bad enough to spend the amount of compute an individual typically has on hand in order to get me to do the work to serve you.

ksymph 3 months ago
Here's a more relevant quote from the link:
> Anubis is a man-in-the-middle HTTP proxy that requires clients to either solve or have solved a proof-of-work challenge before they can access the site. This is a very simple way to block the most common AI scrapers because they are not able to execute JavaScript to solve the challenge. The scrapers that can execute JavaScript usually don't support the modern JavaScript features that Anubis requires. In case a scraper is dedicated enough to solve the challenge, Anubis lets them through because at that point they are functionally a browser.
As the article notes, the work required is negligible, and as the linked post notes, that's by design. Wasting scraper compute is part of the picture to be sure, but not really its primary utility.
- kevincox 3 months ago
  
  Why require proof of work with difficulty at all then? Just have no UI other than (javascript) required and run a trivial computation in WASM as a way of testing for modern browser features. That way users don't complain that it is taking 30s on their low-end phone and it doesn't make it any easier for scrapers to scrape (because the PoW was trivial anyways).
ranger_danger 3 months ago
The compute also only seems to happen once, not for every page load, so I'm not sure how this is a huge barrier.
- untilted 3 months ago
  
  Once per ip. Presumably there's ip-based rate limiting implemented on top of this, so it's a barrier for scrapers that aggressively rotate ip's to circumvent rate limits.
- debugnik 3 months ago
  
  It happens once if the user agent keeps a cookie that can be used for rate limiting. If a crawler hits the limit they need to either wait or throw the cookie away and solve another challenge.
  
  1 reply →