Comment by maxloh

1 day ago

To my understanding, if the material is publicly available or obtained legally (i.e., not pirated), then training a model with it falls under fair use, at least in the US and some other jurisdictions.

If the training is established as fair use, the underlying license doesn't really matter. The term you added would likely be void or deemed unenforceable if someone ever brought it to a court.

This is at least murky, since a lot of pirated material is “publicly available”. Certainly some has ended up in the training data.

  • It isn't? You have to break the law to get it. It's publicly available like your TV is if I were to break into your house and avoid getting shot.

    • Maybe you have some legalistic point that escapes comprehension, but I certainly consider my house to be much private and the internet public.

I wouldn't say this is settled law, but it looks like this is one of the likely outcomes. It might not be possible to write a license to prevent training.

  • Isn't the court fight on fair use failing pretty hard on the prong that flooding the market with cheap copies eliminates the market for the original work?

Fair use was for citing and so on not for ripping off 100% of the content.

  • Copyright protects the expression of an idea, not the idea itself. Therefore, an LLM transforming concepts it learned into a response (a new expression) would hardly qualify as copyright infringement in court.

    This principle is also explicitly declared in US law:

    > In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work. (Section 102 of the U.S. Copyright Act)

    https://www.copyrightlaws.com/are-ideas-protected-by-copyrig...

    • Recoding a video file doesn't get rid of the copyright therefore doing some automatic processing on a copyrighted material doesn't remove the copyright.

      The problem is that openai has too much money. But if I did what they are doing I'd get into massive legal troubles.

      1 reply →