Comment by runnig

11 hours ago

I'll just leave it here: "Anthropic's downloading of over seven million books from pirate sites like LibGen constituted infringement, the judge ruled, rejecting Anthropic's "research purpose" defense: "You can't just bless yourself by saying I have a research purpose and, therefore, go and take any textbook you want."

https://www.joneswalker.com/en/insights/blogs/ai-law-blog/wh...

53 comments

runnig

scientism 9 hours ago

Don't you find it funny that when you ask for song lyrics these models suddenly remember copyrighted material?

f6v 9 hours ago

Some do, others decline to answer.

rienbdj 9 hours ago

In the early days of music streaming, many of the entrants were seeding their service with vast libraries of pirated content. The winners cut deals with the copyright holders and then went after the rest.

smurda 8 hours ago
Or the early days of video uploads, YouTube's most watched videos were "pirated" clips from popular shows (e.g. SpongeBob, The Daily Show) and part of the reason I went to YouTube instead of other video hosting sites (e.g. DailyMotion).
Viacom sued YouTube, while CBS and Universal ended up licensing their content.
https://www.eff.org/deeplinks/2007/03/viacom-v-google-invest...
- radicalbyte 8 hours ago
  
  They still are. My kids haven't watched a single Simpsons or Family Guy episode but are quoting both regularly.
  Facebook et al also quite literally stole email contact lists and installed spyware at kernel level on mobile phones which they used to spy on all Android users. Via the phone manufacturers.

nicce 11 hours ago

Yet they did not need to destroy the models which were trained with them?

ascorbic 11 hours ago
Using them was allowed as fair use – it was the downloading of the pirated copies that was infringement. That's why Anthropic switched to scanning paper books.
- maccard 11 hours ago
  
  > That's why Anthropic switched to scanning paper books.
  After they threw away all the tainted data from the pirated books, right?
  
  4 replies →
- pera 9 hours ago
  
  > Using them was allowed as fair use
  That is only relevant in the US, and even there it is still not clear-cut whether the fair use doctrine applies on all these scenarios. Outside of the US the situation is also quite different: for example take a look at the recent ruling on GEMA vs OpenAI in Germany.
  The reality is that the copyright issue with generative AI is very complex and reaching anything resembling a conclusion will take much more than a few opinion paragraphs from an American district judge.
- kykeonaut 10 hours ago
  
  Isn't scanning also a form of copyright infringement? You are making a digital copy of a book, which is the same thing as downloading a book from the internet...
  
  16 replies →
- olalonde 10 hours ago
  
  > That's why Anthropic switched to scanning paper books.
  Could they not just subscribe to the academic publishers like universities do? Or buy eBooks? I don't understand how the "scanning" part is relevant here other than used physical books being cheaper perhaps?
  
  1 reply →
- realusername 11 hours ago
  
  If using the books is fair use, then distilling the model, which is just a derived product of those books is also fair use.
  These companies are trying to have their cake and eat it too.
  
  6 replies →
- nicce 11 hours ago
  
  In a different world it is not fair use. The benefits of the crime should be always taken off. If you isolate the training and pirating, you may say that it was fair, but that completely misses the point. The sole purpose of pirating (aka crime) was to train the models.
  
  2 replies →
zaptrem 11 hours ago
Should we require the destruction of the brains of those that watch pirated movies?
- hmry 11 hours ago
  
  Different situations call for different responses.
  When someone steals a watch, we force them to give it back. Yet when someone steals a cake and eats it, we don't force them to puke it back up.
  If you pirate a movie, the court might very well force you to delete all the copies you made of the movie you downloaded, destroy DVDs you burned, etc.
  
  2 replies →
- TightFibre 10 hours ago
  
  Well I enjoyed this response.
- nicce 5 hours ago
  
  Have we already agreed that AI is already equal to human life and not machine?

RobotToaster 10 hours ago

"You're trying to kidnap what I've rightfully stolen!"

gmerc 11 hours ago

How many “capabilities” did they “extract” from those books?

thepasch 10 hours ago

The capabilities of the books' writers to produce the text contained within them, which is exactly what Alibaba "extracted" from Claude. The point here is that Anthropic's framing as some sort of sophisticated technological attack is the ridiculous part. It's writing prompts and saving responses. We're all running "distillation attacks" on Claude, every day! Most of us just don't feed that stuff into a training corpus.

basisword 10 hours ago

Exactly. Couldn't happen to better people. I'm pretty against piracy personally but if we find reliable ways to pirate Anthropic/OpenAI products in the future I'm all for it.