You can't distribute the copyrighted works, but that isn't inherently the same thing as the model.
It's sort of like distributing a compendium of book reviews. Many of the reviews have quotes from the book. If there are thousands of reviews, you could potentially reconstruct the whole book, but that's not the point of the thing and so it makes sense for the infringing thing to be "using it to reconstruct the whole book" rather than "distributing the compendium".
And then Anthropic fended off the argument that their service was intended for doing the former because they were explicitly taking measures to prevent that.
The premise was that the model is able to reproduce the memorized text, and that what saved Anthropic was them having server-side filtering to avoid reproducing that text. So the presumption is that without those filters, the model would be able to reproduce text substantial enough to constitute a copyright violation (otherwise they wouldn’t need the filter argument). Distributing a “machine” producing such output would constitute copyright infringement.
Maybe this is a misrepresentation of the actual Anthropic case, I have no idea, but it’s the scenario I was addressing.
You can also, in the US, use it for any purposes which fall within the domain of "fair use", which while now also incorporated in the copyright statute, was first identified as an application of the first amendment and, as such, a constitutional limit on what Congress even had the power to prohibit with copyright law (the odd parameters of the statutory exception are largely because it attempted to codify the existing Constitutional case law.)
Purposes which are fair use are very often not at all personal.
(Also, "personal use" that involves copying, creating a derivative work, or using any of the other exclusive rights of a copyright holder without a license or falling into either fair use or another explicit copyright exception are not, generally, allowed, they are just hard to detect and unlikely to be worth the copyright holder's time to litigate even if they somehow were detected.)
People have been coming up with convoluted piracy loopholes since the invention of copyright.
If you xor some data with random numbers, both the result and the random numbers are indistinguishably random and there is no way to tell which one came out of a random number generator and which one is "derived" from a copyrighted work. But if you xor them together again the copyrighted work comes out. So if you have Alice distribute one of the random looking things and Bob distribute the other one and then Carol downloads them both and reconstructs the copyrighted work, have you created a scheme to copy whatever you want with no infringement occurring?
Of course not, at least Carol is reproducing an infringing work, and then there are going to be claims of contributory infringement etc. for the others if the scheme has no other purpose than to do this.
Meanwhile this problem is also boring because preventing anyone from being the source of infringing works isn't a thing anybody has been able to do since at least as long as the internet has allowed anyone to set up a server in another jurisdiction.
But you can’t distribute it, which in the scenario mentioned in the parent’s final paragraph arguably happens.
You can't distribute the copyrighted works, but that isn't inherently the same thing as the model.
It's sort of like distributing a compendium of book reviews. Many of the reviews have quotes from the book. If there are thousands of reviews, you could potentially reconstruct the whole book, but that's not the point of the thing and so it makes sense for the infringing thing to be "using it to reconstruct the whole book" rather than "distributing the compendium".
And then Anthropic fended off the argument that their service was intended for doing the former because they were explicitly taking measures to prevent that.
The premise was that the model is able to reproduce the memorized text, and that what saved Anthropic was them having server-side filtering to avoid reproducing that text. So the presumption is that without those filters, the model would be able to reproduce text substantial enough to constitute a copyright violation (otherwise they wouldn’t need the filter argument). Distributing a “machine” producing such output would constitute copyright infringement.
Maybe this is a misrepresentation of the actual Anthropic case, I have no idea, but it’s the scenario I was addressing.
1 reply →
You can also, in the US, use it for any purposes which fall within the domain of "fair use", which while now also incorporated in the copyright statute, was first identified as an application of the first amendment and, as such, a constitutional limit on what Congress even had the power to prohibit with copyright law (the odd parameters of the statutory exception are largely because it attempted to codify the existing Constitutional case law.)
Purposes which are fair use are very often not at all personal.
(Also, "personal use" that involves copying, creating a derivative work, or using any of the other exclusive rights of a copyright holder without a license or falling into either fair use or another explicit copyright exception are not, generally, allowed, they are just hard to detect and unlikely to be worth the copyright holder's time to litigate even if they somehow were detected.)
Hey can I have a fake llm "trained" on a set of copyrighted works to ask what those works are?
So it totally isn't a warez streaming media server but AI?
I'm guessing since my net worth isn't a billion plus, the answer is no
People have been coming up with convoluted piracy loopholes since the invention of copyright.
If you xor some data with random numbers, both the result and the random numbers are indistinguishably random and there is no way to tell which one came out of a random number generator and which one is "derived" from a copyrighted work. But if you xor them together again the copyrighted work comes out. So if you have Alice distribute one of the random looking things and Bob distribute the other one and then Carol downloads them both and reconstructs the copyrighted work, have you created a scheme to copy whatever you want with no infringement occurring?
Of course not, at least Carol is reproducing an infringing work, and then there are going to be claims of contributory infringement etc. for the others if the scheme has no other purpose than to do this.
Meanwhile this problem is also boring because preventing anyone from being the source of infringing works isn't a thing anybody has been able to do since at least as long as the internet has allowed anyone to set up a server in another jurisdiction.