← Back to context

Comment by WithinReason

1 day ago

Wouldn't it be still legal to train on the data due to fair use?

I don't think it's fair use, but everyone on Earth disagree with me. So even with the standard default licence that prohibits absolutely everything, the humanity-1 consider it fair use.

  • Honest question: why don’t you think it is fair use?

    I can see how it pushes the boundary, but I can’t lay out logic that it’s not. The code has been publish for the public to see. I’m always allowed to read it, remember it, tell my friends about it. Certainly, this is what the author hoped I would do. Otherwise, wouldn’t they have kept it to themselves?

    These agents are just doing a more sophisticated, faster version of that same act.

    • Some project like Wine forbids you to contribute if you ever have seen the source of MS Windows [1]. The meatball inside your head is tainted.

      I don't remember the exact case now, but someone was cloning a program (Lotus123 -> Quatro or Excel???). They printed every single screen and made a team write a full specification in English. Later another separate team look at the screenshots and text and reimplement it. Apparently meatballs can get tainted, but the plain English text loophole was safe enough.

      [1] From https://gitlab.winehq.org/wine/wine/-/wikis/Developer-FAQ#wh...

      > Who can't contribute to Wine?

      > Some people cannot contribute to Wine because of potential copyright violation. This would be anyone who has seen Microsoft Windows source code (stolen, under an NDA, disassembled, or otherwise). There are some exceptions for the source code of add-on components (ATL, MFC, msvcrt); see the next question.

      1 reply →

    • Before LLMs programmers had pretty good intuition what GPL license allowed for. It is of course clear that you cannot release a closed source program with GPL code integrated into it. I think it was also quite clear, that you cannot legally incorporate GPL code into such a program, by making changes here and there, renaming some stuff, and moving things around, but this is pretty much what LLMs are doing. When humans do it intentionally, it is violation of the license, when it is automated and done on a huge scale, is it really fair use?

      3 replies →

    • The fair use prong that's problematic is that the fair use can't decimate the value of the original work. It's the difference between me imitating your art style for a personal project and me making 1,000,000 copies of your art so that your art isn't worth much anymore. One is a fair use, the other is exploitative extraction

  • Just corporations, their shills, and people who think llms are god's gift to humanity disagree with you.