Comment by afro88

9 hours ago

Same as if a regular person did the same. They are responsible for it. If you're using AI, check the code doesn't violate licenses

In certain law cases plagiarization can be influenced by the fact if person is exposed to the copyrighted work. AI models are exposed to very large corpus of works..

  • Copyright infringement and plagiarism are not the same or even very closely related. They're different concepts and not interchangeable. Relative to copyright infringement, cases of plagiarism are rarely a matter for courts to decide or care about at all. Plagiarism is primarily an ethical (and not civil or criminal) matter. Rather than be dealt with by the legal system, it is the subject of codes of ethics within e.g. academia, journalism, etc. which have their own extra-judicial standards and methods of enforcement.

    • I suspect they were instead referring to patents; for example, when I worked at Google, they told the engineers not to read patents because then the engineer might invent something infringing, I think it's called willful infringement. No other employer I've worked for has every raised this as an issue, while many lawyers at google would warn against this.

As opposed to an irregular person?

LLMs are not persons, not even legal ones (which itself is a massive hack causing massive issues such as using corporate finances for political gain).

A human has moral value a text model does not. A human has limitations in both time and memory available, a model of text does not. I don't see why comparisons to humans have any relevance. Just because a human can do something does not mean machines run by corporations should be able to do it en-masse.

The rules of copyright allow humans to do certain things because:

- Learning enriches the human.

- Once a human consumes information, he can't willingly forget it.

- It is impossible to prove how much a human-created intellectual work is based on others.

With LLMs:

- Training (let's not anthropomorphize: lossily-compressing input data by detecting and extracting patterns) enriches only the corporation which owns it.

- It's perfectly possible to create a model based only on content with specific licenses or only public domain.

- It's possible to trace every single output byte to quantifiable influences from every single input byte. It's just not an interesting line of inquiry for the corporations benefiting from the legal gray area.

  • Dude come on, I clearly wasn't saying LLMs are people. My point was it's a tool and it's the responsibility of the person wielding it to check outputs.

    If it's too hard to check outputs, don't use the tool.

    Your arguments about copyright being different for LLMs: at the moment that's still being defined legally. So for now it's an ethical concern rather than a legal one.

    For what it's worth I agree that LLMs being trained on copyright material is an abuse of current human oriented copyright laws. There's no way this will just continue to happen. Megacorps aren't going to lie down if there's a piece of the pie on the table, and then there's precedent for everyone else (class action perhaps)

How could you do that though? You can’t guarantee that there aren’t chunks of copied code that infringes.

  • But the responsible party is still the human who added the code. Not the tool that helped do so.

    • The practical concern of Linux developers regarding responsibility is not being able to ban the author, it's that the author should take ongoing care for his contribution.

    • In a court case the responsibility party very well could be the Linux foundation because this is a foreseeable consequence of allowing AI contributions. There’s no reasonable way for a human to make such a guarantee while using AI generated code.

      19 replies →