← Back to context

Comment by Paracompact

10 hours ago

Don't know if it helps your musings at all, but there's a good chance that if a high-profile crawler like archive.org disrespected their robots.txt, that archive.org would be faced with lawsuits (or some other form of pressure). This is not merely the most moral move; rather it is the only sensible move.

The only reason "others are rewarded with profit" in cases like these are because pinkie-promise-style obligations don't affect players too small or shadowy to bother litigating.

>pinkie-promise-style obligations don't affect players too small or shadowy to bother litigating

I think you're looking at the wrong end of the spectrum there. It's some of the biggest players who flaunt the rules.

"Several AI companies said to be ignoring robots dot txt exclusion, scraping content without permission: report" (2024) https://www.tomshardware.com/tech-industry/artificial-intell...

  • Fair point. Being small and shadowy is a sufficient condition to avoid litigation, but not a necessary one. Another sufficient condition is having billions of dollars to throw around. Unfortunately, archive.org is well known, well loved, and fundamentally harmless.