← Back to context

Comment by frameset

5 hours ago

It actually is.

I run a small video game forum with posts going back to 2008. We got absolutely smashed by bots scraping for training data for LLMs.

So I put it behind Cloudflare and now it's down. Ho hum.

Have you tried Anubis or similar tools? I've had similar issues with bot scraping of a forum taking all server resources, and using PoW challenge solved the problem.

https://github.com/TecharoHQ/anubis

  • I did! It's very cool tech. However for our config it was easier to slap CF in front of it.

    I will say one very appealing use of Anubis I'd love to try is using it as a Traefik middleware to protect services running in docker containers.

Same problem here. If I didn't use Cloudflare, nearly all of my traffic would be (apparently misconfigured) scraper bots.

Can you please elaborate on “smashed”? I’m very interested

  • I took a screenshot of the graph in cloudflare when I switched on the bot challenges.

    https://i.ibb.co/qHCJyY7/image.png

    I wrote the below to explain to our users what was happening, so apologies if the language is too simple for a HN reader.

    - 0630, we switched our DNS to proxy through CF, starting the collection of data, and implemented basic bot protections

    - Unfortunately whatever anti-bot magic they have isn't quite having the effect, even after two hours.

    - 0830, I sign in and take a look at the analytics. It seems like <SITE NAME> is very popular in Vietnam, Brazil, and Indonesia.

    - 0845, I make it so users from those countries have to pass a CF "challenge". This is similar to a CAPTCHA, but CF try to make it so there's no "choosing all the cars in an image" if they can help it.

    - So far 0% of our Asian audience have passed a challenge.