← Back to context

Comment by riskable

7 months ago

Google trained their AI on stuff they scraped without knowing whether it was pirated content. Why should it be different for Anthropic?

Google literally scrapes pirated content all day every day. When they do that they have no idea if the content was legally placed on that website. Yet, they scan and index it anyway because there's actually no way to know (at all!). There's no great big database of all copyrighted works they can reference.

I'm not saying Meta and Anthropic didn't know they were pirating content. I'm saying that it should be moot since they never distributed it. You can't claim a violation of copyright for content that was never actually "copied" (aka distributed). The site/seeders that uploaded the content to Meta/Anthropic are the violators since copyright is all about distribution rights.