Comment by Workaccount2
3 months ago
Training on copyright is not illegal. Even in the lawsuit against anthropic it was found to be fair use.
Pirating material is a violation of copyright, which some labs have done, but that has nothing to do with training AI and everything to do with piracy.
If my for profit/for sale product couldn't exist without inputting copyrighted works into it, then my product is derivative of those works. It's a pretty simple concept. No 'but human brains learn'. Humans aren't a corpo's for profit product.
'Would this product have the same value without the copyrighted works?'
If yes then it's not derivative. If no then it is.
There is US precedent for training being deemed not fair use. https://www.dglaw.com/court-rules-ai-training-on-copyrighted...
Why wouldn’t training be illegal? It’s illegal for me to acquire and watch movies or listen to songs without paying for them*. If consuming copyrighted material isn’t fair use, then it doesn’t make sense that AI training would be fair use.
* I hope it’s obvious but I feel compelled to qualify that, of course, I’m talking about downloading (for example torrenting) media, and not about borrowing from the library or being gifted a DVD, CD, book or whatever, and not listening/watching one time with friends. People have been successfully prosecuted for consuming copyrighted material, and that’s what I’m referring to.
That interpretation is not correct. The owner explicitly denied license to the data and then the company went to a third party to gain access to the data that they were denied license to.
> When building its tool, Ross sought to license Westlaw’s content as training data for its AI search engine. As the two are competitors, Thomson Reuters refused. Instead, Ross hired a third party, LegalEase, to provide training data in the form of “Bulk Memos,” which were created using Westlaw headnotes. Thomson Reuters’s suit followed, alleging that Ross had infringed upon its copyrighted Westlaw headnotes by using them to train the AI tool.
You’re contradicting the conclusion / interpretation written on dglaw.com? What is incorrect, exactly? It doesn’t seem like your summary challenges either my comment or the article I linked to, it’s not clear what you’re arguing. The court did find in this case that the use of the unlicensed data used for AI training was not fair use.
1 reply →
> Training on copyright is not illegal.
The court decision this thread is about holds that it is, on the grounds that the training data was copied to the LLM's memory.