Comment by trymas
13 hours ago
> Anthropic's IP was created by harvesting and "distilling" other people's IP. Copyrighted materials, and the commons... which they have essentially privatized.
Anthropic and others argue that because LLMs don’t output full copyrighted works word for word - hence their LLMs aren’t infringing on copyright laws.
I think (if this ever comes to that) Chinese lab should use same arguments against Anthropic.
UPDATE: this is slight hyperbole of course, not worth arguing what they actually said. The point is intent and the facts - "The Big LLMs" "distilled" collective knowledge including copyrighted works at unimaginable scale, but it's all kosher and totally not piracy/copyright infringement. Though if you're teenager torrenting an mp3 - you'll get screwed.
> LLMs don’t output full copyrighted works word for word
Apparently they do, as per the evidence in the NYT vs OpenAI suit.
Isn’t the output of LLMs completely copyright-free in the US?
One lower court has said that the output of AI models is uncopyrightable.
But the real unsettled issue is if model training is fair use, and where copyright infringement might creep in to model output.
The copyright office itself also says this when it talks about determining authorship.
> Anthropic and others argue that because LLMs don’t output full copyrighted works word for word - hence their LLMs aren’t infringing on copyright laws.
That surely can't be what they argue, because I'm sure I can't translate a copyrighted book into a different language and say "that's fine, it's not word-for-word".