← Back to context

Comment by margalabargala

1 day ago

The problem I have, is they hammer my site so hard they take it down.

The content is for everyone. They can have it. Just don't also take it away from everybody else.

Unintentional denial-of-service attacks from AI scrapers are definitely a problem, I just don't know if "theft" is the right way to classify them. They shouldn't get lumped in with intellectual property concerns, which are a different matter. AI scrapers are a tragedy of the commons problem kind of like Kessler syndrome: a few bad actors can ruin low Earth orbit for everyone via space pollution, which is definitely a problem, but saying that they "stole" LEO from humanity doesn't feel like the right terminology. Maybe the problem with AI scrapers could be better described as "bandwidth pollution" or "network overfishing" or something.

  • Theft isn't far off, it seems closer to me than using the word for IP violations.

    When a crawler aggressively crawls your site, they're permanently depriving you the use of those resources for their intended purpose. Arguably, it looks a lot like conversion.

  • If I took a photo off your photography blog and used it on my corporate website without your say or input, I don't think it would be unfair to call that stealing.

    Doing that on a mass scale with an obfuscation step in between suddenly makes it ok? I'm not convinced.

  • you're totally right about not being theft, but we have a term. you used it yourself, "distributed denial of service". that's all it is. these crawlers should be kicked off the internet for abuse. people should contact the isp of origin.

    • Firstly, since this argument is about semantic pedantry anyways, it's just denial-of-service, not distributed denial-of-service. AI scraper requests come from centralized servers, not a botnet.

      Secondly, denial-of-service implies intentionality and malice that I don't think is present from AI scrapers. They cause huge problems, but only as a negligent byproduct of other goals. I think that the tragedy of the commons framing is more accurate.

      EDIT: my first point was arguably incorrect because some scrapers do use decentralized infrastructure and my second point was clearly incorrect because "denial-of-service" describes the effect, not the intention. I retract both points and apologize.

      6 replies →

Been there recently. Rate limit on nginx and anti-syn flood on pf solved it.

  • I'm being hit with 300 req/s 24/7 from hundreds of thousands of unique IP's from residential proxies. I can't rate limit any further without hurting the real users.