Comment by pyman

6 months ago

What does "fair use" even mean in a world where models can memorise and remix every book and song ever written? Are we erasing ownership?

The problem is, copyright law wasn't written for machines. It was written for humans who create things.

In the case of songs (or books, paintings, etc), only humans and companies can legally own copyright, a machine can't. If an AI-powered tool generates a song, there’s no author in the legal sense, unless the person using the tool claims authorship by saying they operated the tool.

So we're stuck in a grey zone: the input is human, the output is AI generated, and the law doesn't know what to do with that.

For me the real debate is: Do we need new rules for non-human creation?

why are you saying "memorize"? are people training AIs to regurgitate exact copies? if so, that's just copying. if they return something that is not a literal copy of the whole work, then there is established caselaw about how much is permitted. some clearly is, but not entire works.

when you buy a book, you are not acceding to a license to only ever read it with human eyes, forbearing to memorize it, never to quote it, never to be inspired by it.

  • > Specifically, the paper estimates that Llama 3.1 70B has memorized 42 percent of the first Harry Potter book well enough to reproduce 50-token excerpts at least half the time. (I’ll unpack how this was measured in the next section.)

    > Interestingly, Llama 1 65B, a similar-sized model released in February 2023, had memorized only 4.4 percent of Harry Potter and the Sorcerer's Stone. This suggests that despite the potential legal liability, Meta did not do much to prevent memorization as it trained Llama 3. At least for this book, the problem got much worse between Llama 1 and Llama 3.

    > Harry Potter and the Sorcerer's Stone was one of dozens of books tested by the researchers. They found that Llama 3.1 70B was far more likely to reproduce popular books—such as The Hobbit and George Orwell’s 1984—than obscure ones. And for most books, Llama 3.1 70B memorized more than any of the other models.

  • You are comparing AI to humans, but they're not the same. Humans don't memorise millions of copyrighted work and spit out similar content. AI does that.

    Memorising isn't wrong but when machines memorise at scale and the people behind the original work get nothing, it raises big ethical questions.

    The law hasn't caught up.

    • As a former musician, yes, we do. Any above average musician can play "Riders on the Storm" in the style of Johnny Cash, or Green Day, or Nirvana, etc. Successful above average musicians usually have almost encyclopedic knowledge of artists and albums at least in their favorite genre. This is how all art is made. Some artists will be more honest about this than others.

      4 replies →

  • The wast majority of piracy are not literal copies. Movies and music get constantly transformed into different sizes and scales, with the majority using lossy transformations that changes the work. A movie taken as raw format and transformed into 144p has far less than 1% of the original work, and is barely recognizable. Copyright law seems to recognize that as infringement.

    Most AI seems much better at reproducing a semi-identical copies of an original work than existing video/audio encoders.