← Back to context

Comment by bwfan123

1 day ago

Sometime, LLMs actually generate copyright headers as well in their output - lol - like in this PR which was the subject of a recent HN post [1]

https://news.ycombinator.com/item?id=46039274

I once had a well-known LLM reproduce pretty much an entire file from a well-known React library verbatim.

I was writing code in an unrelated programming language at the time, and the bizarre inclusion of that particular file in the output was presumably because the name of the library was very similar to a keyword I was using in my existing code, but this experience did not fill me with confidence about the abilities of contemporary AI. ;-)

However, it did clearly demonstrate that LLMs with billions or even trillions of parameters certainly can embed enough information to reproduce some of the material they were trained on verbatim or very close to it.

So what? I can probably produce parts of the header from memory. Doesn't mean my brain is GPLed.

  • If you have seen say, for example, the Windows source code, you cannot take certain jobs implementing Windows-compatible interfaces that are supposed to be free from Microsoft IP. One could say your brain has been "infected". The same is true of many things around intellectual property.

  • There is a stupid presupposition that LLMs are equivalent to human brains which they clearly are not. Stateless token generators are OBVIOUSLY not like human brains even if you somehow contort the definition of intelligence to include them

    • Even if they are not "like" human brains in some sense, are they "like" brains enough to be counted similarly in a legal environment? Can you articulate the difference as something other than meat parochialism, which strikes me as arbitrary?

      5 replies →

  • not your brain, but the code you produce if it includes portions of GPL code that you remembered.

  • > Doesn't mean my brain is GPLed.

    It would be if they could get away with it. The likes of Disney would delete your memories of their films if they could get away with it. If you want to enjoy the film, you should have to pay them for the privilege, not recall the last time you watched it.

    • Imma pitch them a cinema exit turnstile with a barcode reader and a bat. You pay the retention tax or you get bonked. Once they see the ROI we can expand service via collaboration with services like Uber to ensure equal experience quality at home.

  • > So what? I can probably produce parts of the header from memory. Doesn't mean my brain is GPLed.

    Your brain is part of you. Some might say it is your very essence. You are human. Humans have inalienable rights that sometimes trump those enshrined by copyright. One such right is the right to remember things you've read. LLMs are not human, and thus don't enjoy such rights.

    Moreover, your brain is not distributed to other people. It's more like a storage medium than a distribution. There is a lot less furore about LLMs that are just storage mediums, and where they themselves or their outputs are not distributed. They're obviously not very useful.

    So your analogy is poor.