Comment by agwa

3 days ago

It sounds like you're saying that it's not the proof-of-work that's stopping AI scrapers, but the fact that Anubis imposes an unusual flow to load the site.

If that's true Anubis should just remove the proof-of-work part, so legitimate human visitors don't have to stare at a loading screen for several seconds while their device wastes electricity.

> If that's true Anubis should just remove the proof-of-work part

This is my very strong belief. To make it even clearer how absurd the present situation is, every single one of the proof-of-work systems I’ve looked at has been using SHA-256, which is basically the worst choice possible.

Proof-of-work is bad rate limiting which depends on a level playing field between real users and attackers. This is already a doomed endeavour. Using SHA-256 just makes it more obvious: there’s an asymmetry factor in the order of tens of thousands between common real-user hardware and software, and pretty easy attacker hardware and software. You cannot bridge such a divide. If you allow the attacker to augment it with a Bitcoin mining rig, the efficiency disparity factor can go up to tens of millions.

These proof-of-work systems are only working because attackers haven’t tried yet. And as long as attackers aren’t trying, you can settle for something much simpler and more transparent.

If they were serious about the proof-of-work being the defence, they’d at least have started with something like Argon2d.

  • The proof of work isn't really the crux. They've been pretty clear about this from the beginning.

    I'll just quote from their blog post from January.

    https://xeiaso.net/blog/2025/anubis/

    Anubis also relies on modern web browser features:

    - ES6 modules to load the client-side code and the proof-of-work challenge code.

    - Web Workers to run the proof-of-work challenge in a separate thread to avoid blocking the UI thread.

    - Fetch API to communicate with the Anubis server.

    - Web Cryptography API to generate the proof-of-work challenge.

    This ensures that browsers are decently modern in order to combat most known scrapers. It's not perfect, but it's a good start.

    This will also lock out users who have JavaScript disabled, prevent your server from being indexed in search engines, require users to have HTTP cookies enabled, and require users to spend time solving the proof-of-work challenge.

    This does mean that users using text-only browsers or older machines where they are unable to update their browser will be locked out of services protected by Anubis. This is a tradeoff that I am not happy about, but it is the world we live in now.

    • Except this is exactly the problem. Now you are checking for mainstream browsers instead of some notion of legitimate users. And as TFA shows a motivated attacker can bypass all of that while legitimate users of non-mainstream browsers are blocked.

    • Aren't most scrapers using things like Playright or Puppeteer anyway by now, especially since so many pages are rendered using JS and even without Anubis would be unreadable without executing modern JS?

    • ... except when you do not crawl with a browser at all. It's so trivial to solve just like the taviso post demostrated.

      This makes zero sense, this is simply the wrong approach. Already tired of saying so and been attacked. So I'm glad professional-random-Internet-bullshit-ignorer Tavis Ormandy wrote this one.

  • All this is true, but also somewhat irrelevant. In reality the amount of actual hash work is completely negligible.

    For usability reasons Anubus only requires that you to go trough a the proof of work flow only once in a given period. (I think the default is once per week.) That's just very little work.

    Detecting you need to occasionally send a request trough a headless browser far more of a hassle than the PoW. If you prefer LLMs rather than normal internet search, it'll probably consume far more compute as well.

    • > For usability reasons Anubus only requires that you to go trough a the proof of work flow only once in a given period. (I think the default is once per week.) That's just very little work.

      If you keep cookies. I do not want to keep cookies for otherwise "stateless" sites. I have maybe a dozen sites whitelisted, every other site loses cookies when I close the tab.

      2 replies →

I feel like the future will have this, plus ads displayed while the work is done, so websites can profit while they profit.

  • Every now and then I consider stepping away from the computer job, and becoming a lumberjack. This is one of those moments.

    • my family takes care of a large-ish forest, so I have to help since my early teens. Let me tell you: think twice, it's f*ckin dangerous. Chainsaws, winches, heavy trees falling and breaking in unpredictable ways. I had a couple of close calls myself. Recently a guy from a neighbor village was squashed to death by a root plate that tilted.

      I often think about quitting tech myself, but becoming a full-time lumberjack is certainly not an alternative for me.

      1 reply →

    • Worth getting to know the in and outs of forest management now. I don’t think AI will take most tech jobs soon, but they sure as hell are already making them boring.

  • adCAPTCHA already does this:

    https://adcaptcha.com

    • This is a joke, right? The landing page makes it seem so.

      I tried the captcha in their login page and it made the entire page, including the puzzle piece slider, run at 2 fps.

      My god, we do really live in 2025.

    • Holy shit. Opening the demo from the menu, it's like captchas and youtube ads had a baby

I don't think anything will stop AI companies for long. They can do spot AI agentic checks of workflows that stop working for some reason and the AI can usually figure out what the problem is and then update the workflow to get around it.