Comment by mattigames
14 hours ago
Yes it does, the spirit of the law matters in many one cases. A fair ruling would have declared that authors must be able to forbid the usage of their work as training data for any given model because the "transformative" processes that are being executed are wildly beyond what the writers of the law knew were even possible at the time of the writing of such laws.
> Copyright was build to protect the artist from unauthorized copy by a human not by a machine (a machine wildly beyond their imagination at the time I mean), so the input and output limitations of humans were absolutely taken into account when writing such laws
Copyright law was spurred by the spread of the printing press, a machine which has ability to output full replicas. It does not assume human-like input/output limitations.
> A fair ruling would have declared that authors must be able to forbid the usage of their work as training data for any given model because the "transformative" processes that are being executed are wildly beyond what the writers of the law knew were even possible
Copyright's basis in the US is "To promote the Progress of Science and useful Arts". Declaring a transformative use illegal because it's so novel would seem to run directly counter to that.
To my understanding it's generally the opposite (a pre-existing use with an established market that the rightsholder had expected to exploit) that would weigh against a finding of fair use.
The spirit of the law matters, but there are limits to how much existing statutes can be stretched to cover novel scenarios. Seems to me like new laws may be necessary to keep up (whatever the people would prefer them to be).
I made two points:
- It is not accurate to describe training as “encoding works into the model”.
– A model cannot recreate a Harry Potter book.
Neither of these have anything to do with “the spirit of the law”.
Can it not recreate a book?
I kind of assumed I could ask it for verses from the bible one by one till i have the full book?
When i ask chatgpt for a specific page or so from HP I get the impression that the model would be perfectly capable of doing so but is hindred by extra work openAI put in to prevent the answer specifically because of copyright. In which case the question: What if someone manages to do some prompt trickery again to get past it? Are they then responsible?
No, it can’t recreate a book. Well, maybe it could get most of the way for the Bible. That is an exceptional case because its adherents are constantly quoting verses religiously. I expect it’s the most reproduced, quoted, and translated book in history by a very significant margin. It’s also not copyrighted.
Can you do this for the general case? No, not even for extremely popular books. People might quote Harry Potter a lot, but they don’t quote the entire thing over and over, chapter and verse, on hundreds of thousands of different websites. The number of times Bible verses appear in the training data is going to absolutely dwarf the number of times Harry Potter quotes appear, and people aren’t quoting all parts of Harry Potter, just the interesting parts.
> When i ask chatgpt for a specific page or so from HP I get the impression that the model would be perfectly capable of doing so but is hindred by extra work openAI put in to prevent the answer specifically because of copyright.
They do put extra work in to filter this stuff out, but even if they didn’t the model wouldn’t be able to reproduce entire chapters, let alone entire books.
You can test this for yourself. Remember, this lawsuit isn’t against OpenAI, it’s against Meta. Download Llama and try to get it to reproduce Harry Potter. There won’t be any guardrails imposed on top of the model if you run it locally.
5 replies →
> proportionality threshold for copyright to matter.
This is the part I have a problem with, that threshold was put there for humans based on their capabilities, it's an extremely dishonest assessment that the same threshold must apply for a LLM and it's outputs, those works were created to be read by humans not a for-profit statistical inference machine, the derivative nature were also expected to be caused by the former no the later, so the judge should have admitted that the context of the law is insufficient and that copyright must include the power of forbidding the usage of one's work into such model for copyright to continue fulfilling it's intended purpose (or move the case to the supreme court I guess)
> that threshold was put there for humans based on their capabilities
It wasn’t. It’s there because a small proportion being reproduced doesn’t harm the copyright holder in the same way a full reproduction does.
Nobody is going to stop buying Harry Potter books because they can get an LLM to spit out ~50 words from the book. This is entirely in line with the spirit of the law. This is exactly why proportionality is a factor in fair use.