Comment by egorfine

4 hours ago

Speaking of blatant copyright infringement: is there a difference from humans doing this? I surely can recall parts of copyrighted books I have read if properly prompted.

IANAL, but wouldn't this LLM behavior be more akin to a human re-publishing an entire book to some third party, in exchange for money?

  • The whole world would not be possible without people re-publishing parts of books to some third party in exchange for money.

    Think textbooks. Laws. Medicine.

    What's the difference? The size of quotation? The exact wording? Surely re-publishing an entire book word for word is piracy. What if I rewrite the whole book slightly? What if I publish just a part? A rewritten part?

    Where do we draw the line with humans and why should the line be different with LLMs?

    (I don't have answers to those questions)

    • Your questions would be quickly answered by looking at the relevant style guides. Any university will also have webpage about citations: APA, Chicago, MLA, etc.

I doubt you would ever blurt out a copyrightable portion of a book without realizing that's what you're doing. That's the biggest difference.

In particular, you are a legal person who can be sued in civil court if you infringe on copyright. If I ask you "can you help me write a blog about Manhattan?" and you plagiarize the New York Times, then the NYT sues me for copyright infringement, then I would correctly assume you conned me, and you are responsible for the infringement, and I would vindictively drag you into the lawsuit with me. With LLMs it involves dragging in a corporation, much much uglier. Claude is not actually a person and cannot testify in any legally legitimate trial. (I am sure it will happen soon in some kangaroo court.)

  • True. What if I reword a copyrighted portion slightly?

    See, the line is blurry.

    • Yes, we've known the line is blurry for hundreds of years, that's why we have courts. That has nothing to do with the specific problem of LLMs infringing copyright. LLMs needs to be held to much higher scrutiny because they are not capable of taking legal responsibility for copyright infringement, regardless of whether its verbatim or a more ambiguous case, and their users can't be expected to know off-hand whether the output is copyrighted or not.

      Based on this comment: https://news.ycombinator.com/item?id=47960014 it seems like you are just ignorant about the basics of copyright law, and pretending this ignorance is some sort of flaw in the idea of copyright itself.