← Back to context

Comment by realusername

12 hours ago

> Hmm, training on a book’s text smears the content all over the weights, merging it with all other texts. The original text isn’t intentionally supposed to be reproducible in any larger part (although IIRC models were able to emit fairly large chunks verbatim).

I agree with that, however that doesn't make the output copyrightable then.

I think these AI companies live in a legal fantasy where they can take any content they want, put it into the mixer without caring about copyright and then what comes out of it is somehow copyrighted.

They have to pick one or the other, either the content copyright tains the model or it doesn't but the model isn't subject to copyright.

> those are human expectation models, something like how we train animals or teach our own.

But more importantly, made by machines, and one of the requirements for copyright is the human factor.

> I agree with that, however that doesn't make the output copyrightable then.

It could be - databases are copyrightable. It's long established that if you put some effort into categorizing and processing information, you get rights for that work. Basically, you can get rights a phone book or a map, even if individual bits are not copyrightable. You can also get rights on a compilation or a catalog of other copyrighted works - although original authors' rights remain. But - there's a legal trick to avoid a liability even if you infringe: fair use doctrine.

> where they can take any content they want, put it into the mixer without caring about copyright and then what comes out of it is somehow copyrighted.

Yes. It's not a legal fantasy, though - that's what they had actually pulled off, as far as I understand it (and, again, IANAL, just a layman who's interested in this stuff a little bit). They argued their work is so highly transformative to allow fair use doctrine to shield from liability on copyright infringement claims. And courts seemed to agree, making this fantasy a reality. Just because that's how legal system works.

Model is still a derived work (AFAIK there's no legal way to clear that) of all the books and articles and whatever else is copyrightable (plus a ton more of non-copyrightable stuff), but there's no liability for training on all that stuff, because courts had ruled - and that happens on a case-per-case basis - that it falls under fair use.

And there's the difference: now Anthropic argues that copying the behavior verbatim is not transformative enough to shield Alibaba from liability by invoking fair use. Now it's up to the courts (if they sue and don't just do the PR dances) to check it out.

Disclaimer: first, I'm not a legal expert, and second - I'm not arguing whether anything is right or wrong, just mapping what happened or being argued to what I know about copyright.

> I think these AI companies live in a legal fantasy where they can take any content they want, put it into the mixer without caring about copyright and then what comes out of it is somehow copyrighted.

The mixer you're talking about is what they seem to claim to be transformative use, no? Unless I'm misunderstanding something, it's not a legal fantasy.

  • > The mixer you're talking about is what they seem to claim to be transformative use, no? Unless I'm misunderstanding something, it's not a legal fantasy.

    If it's transformative use, then it's transformative use of ... what exactly? Copyrighted works? I think the law is pretty clear on what happens on transformative use of copyrighted works.

    • > it's transformative use of ... what exactly? Copyrighted works?

      Yes. Among other stuff, but non-copyrighted stuff is not exactly an issue so it can be left out of our focus most of the time.

      > I think the law is pretty clear on what happens on transformative use of copyrighted works.

      Ah, if only - it's not. You could be mixing it up with concept of derived work - that's where the law is pretty clear (I think). AFAIK (IANAL), transformativeness is merely a suggestive factor for fair use consideration, and then it's all "whatever court decides" with a bunch of guidelines and precedents.