Comment by petesergeant

3 months ago

Yeah? Which commercial provider’s model do you think was trained without using lyrics?

14 comments

petesergeant

The point is that some other vendor will do the work to implement the filtering required by Germany even if OpenAI doesn't.

aniviacat 3 months ago

I would imagine providers who want to comply will scan the LLM's output and pay a license fee to the owner if it contains lyrics.

petesergeant 3 months ago
They scan for commercial work already. Isn’t the law about training, not output?
- aniviacat 3 months ago
  
  Perhaps; I didn't read the court ruling.
  But I'd be surprised if that was generally the case. It's easy to see why ChatGPT 1:1 reproducing a song's lyrics would be a copyright issue. But creating a derivative work based on the song?
  What if I made a website that counts the number of alliterations in certain songs' lyrics? Would that be copyright infringement, because my algorithm uses the original lyrics to derive its output?
  If this ruling really applied to any alogrithm deriving content from copyright protected works, it would be pretty absurd.
  But absurd copyright laws would be nothing new, so I won't discount the possibility.
  
  8 replies →
- dathinab 3 months ago
  
  they clearly didn't do that properly, or we wouldn't have the current law suite
  the lawsuit was also not about weather it is or isn't copy right infringement. It was about who is responsible (OpenAI or the user who tries to bait it into making another illegal copy of song lyrics).
  A model outputting song lyrics means it has it stored somehow somewhere. Just because the storage is in a lossy compressed obscure hyper dimensional transformation of some kind, doesn't mean it didn't store an illegal copy. Or it wouldn't have been able to output it. _Technical details do not protect from legal responsibilities (in general)_
  you could (maybe should) add new laws which in some form treat LLM memorized things the same as if a human did memorize it, but currently LLMs have no special legal treatment when it comes to them storing copies of things.
- Semaphor 3 months ago
  
  No, it’s specifically about (mostly) verbatim producing big chunks of lyrics in the output. The court PR specifically mentioned memorization, retaining training data, multiple times.