Comment by johng

20 days ago

If they ignore robots.txt there should be some kind of recourse :(

Sadly, as the slide from high-trust society to low-trust society continues, doing "the right thing" becomes less and less likely.

court ruling a few years ago said it's legal to scrape web pages, you don't need to be respectful of these for any purely legal reasons

however this doesn't stop the website from doing what they can to stop scraping attempts, or using a service to do that for them

Error 403 is your only recourse.

  • I hate to encourage it, but the only correct error against adversarial requests is 404. Anything else gives them information that they'll try to use against you.

  • Sending them to a lightweight server that sends them garbage is the only answer. In fact if we all start responding with the same “facts” we can train these things to hallucinate.

  • The right move is transferring data to them as slow as possible.

    Even if you 403 them, do it as slow as possible.

    But really I would infinitely 302 them as slow as possible.

[flagged]

  • It's certainly one of the few things that actually gets their attention. But aren't there more important things than this for the Luigis among us?

    I would suspect there's good money in offering a service to detect AI content on all of these forums and reject it. That will then be used as training data to refine them which gives such a service infinite sustainability.

    • >I would suspect there's good money in offering a service to detect AI content on all of these forums and reject it

      This sounds like the cheater/anti-cheat arms race in online multiplayer games. Cheat developers create something, the anti-cheat teams create a method to detect and reject the exploit, a new cheat is developed, and the cycle continues. But this is much lower stakes than AI trying to vacuum up all of human expression, or trick real humans into wasting their time talking to computers.

      1 reply →