Comment by david_allison

1 day ago

> Genuine question: if I train my model with copyleft material, how do you prove I did?

It may produce it when asked

https://chatgpt.com/share/678e3306-c188-8002-a26c-ac1f32fee4...

> It may produce it when asked

that's not proof - it may also be intelligent enough to have produce similar expressions without the original training data.

Not to mention that having knowledge of copyrighted material is not in violation of any known copyright law - after all, human brains also have the knowledge after learning it. The model, therefore, cannot be in violation regardless of what data was used to train it (as long as that data was not obtained illegally).

If someone _chooses_ to use the LLM to reproduce harry potter, or some GPL'ed code, then that person would be in violation of the relevant copyright laws. The copyright owner needs to pursue that person, rather than the owner of the LLM. In the exact same way that if someone used Microsoft Word to reproduce harry potter, microsoft would not have any liability.