← Back to context

Comment by j_w

6 months ago

When you say that's the law, as far as I'm aware a single ruling by a lower court has been issued which upholds that application. Hardly settled case law.

True, until then best to act as if it is the case.

In my opinion, it will be upheld.

Looking at what is stored and the manner which it is stored. It makes sense that it's fair use.

  • We're talking about a summary judgement issued that has not yet been appealed. That doesn't make it "settled."

    If by "what is stored and the manner which it is stored" is intended to signal model weights, I'm not sure what the argument is? The four factors of copyright in no way mention a storage medium for data, lossless or loss-y.

    (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work.

    In my opinion, this will likely see a supreme court ruling by the end of the decade.

    • The use is to train an AI model.

      A trillion parameter SOTA model is not substantially comprised of the one copyrighted piece. (If it was a Harry Potter model trained only on Harry Potter books this would be a different story).

      Embeddings are not copy paste.

      The last point about market impact would be where they make their argument but it's tenuous. It's not the primary use of AI models and built in prompts try to avoid this, so it shouldn't be commonplace unless you're jail breaking the model, most folk aren't.

      4 replies →