← Back to context

Comment by dehrmann

6 months ago

The important parts:

> Alsup ruled that Anthropic's use of copyrighted books to train its AI models was "exceedingly transformative" and qualified as fair use

> "All Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies"

It was always somewhat obvious that pirating a library would be copyright infringement. The interesting findings here are that scanning and digitizing a library for internal use is OK, and using it to train models is fair use.

You skipped quotes about the other important side:

> But Alsup drew a firm line when it came to piracy.

> "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. "Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic's piracy."

That is, he ruled that

- buying, physically cutting up, physically digitizing books, and using them for training is fair use

- pirating the books for their digital library is not fair use.

  • > buying, physically cutting up, physically digitizing books, and using them for training is fair use

    So Suno would only really need to buy the physical albums and rip them to be able to generate music at an industrial scale?

    • Yes! Training and generation are fair use. You are free to train and generate whatever you want in your basement for whatever purpose you see fit. Build a music collection, go ham.

      If the output from said model uses the voice of another person, for example, we already have a legal framework in place for determining if it is infringing on their rights, independent of AI.

      Courts have heard cases of individual artists copying melodies, because melodies themselves are copyrightable: https://www.hypebot.com/hypebot/2020/02/every-possible-melod...

      Copyright law is a lot more nuanced than anyone seems to have the attention span for.

      41 replies →

    • Not sure we can infer that (or anything) about Suno from this ruling. The judge here said that Anthropic's usage was extremely transformative. Would Suno's also be considered that way?

      Anthropic doesn't take books and use them to train a model that is intended to generate new books. (Perhaps it could do that, to some extent, but that's no its [sole] purpose.)

      But Suno would be taking music to train a model in order to generate new music. Is that transformative enough? We don't know what a judge thinks, at least not yet.

    • Only if the physical albums don't have copy protection, otherwise you're circumenventing it and that's illegal. Or is it, against the right to private copy? If anything, AI at least shows that all of the existing copyright laws are utter bullshit made to make Disney happy.

      Do keep in mind though: this is only for the wealthy. They're still going to send the Pinkertons at your house if you dare copy a Blu-ray.

      4 replies →

  • As they mentioned, the piracy part is obvious. It's the fair use part that will set an important precedent for being able to train on copyrighted works as long as you have legally acquired a copy.

  • So all they have to do is go and buy a copy of each book they pirated. They will have ceased and desisted.

    • I'm trying to find the quote, but I'm pretty sure the judge specifically said that going and buying the book after the fact won't absolve them of liability. He said that for the books they pirated they broke the law and should stand trial for that and they cannot go back and un-break in by buying a copy now.

      Found it: https://www.nbcnews.com/tech/tech-news/federal-judge-rules-c...

      > “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft,” [Judge] Alsup wrote, “but it may affect the extent of statutory damages.”

      55 replies →

    • > So all they have to do is go and buy a copy of each book they pirated.

      No, that doesn't undo the infringement. At most, that would mitigate actual damages, but actual damages aren't likely to be important, given that statutory damages are an alternative and are likely to dwarf actual damages. (It may also figure into how the court assigns statutory damages within the very large range available for those, but that range does not go down to $0.)

      > They will have ceased and desisted.

      "Cease and desist" is just to stop incurring additional liability. (A potential plaintiff may accept that as sufficient to not sue if a request is made and the potential defendant complies, because litigation is uncertain and expensive. But "cease and desist" doesn't undo wrongs and neutralize liability when they've already been sued over.)

      1 reply →

    • Generally you don't want laws to work that way. You want to set the penalties so that they discourage violating the law.

      Setting the penalty to what it would have cost to obey the law in the first place does the opposite.

      12 replies →

  • > That is, he ruled that

    > - buying, physically cutting up, physically digitizing books, and using them for training is fair use

    > - pirating the books for their digital library is not fair use.

    That seems inconsistent with one another. If it's fair use, how is it piracy?

    It also seems pragmatically trash. It doesn't do the authors any good for the AI company to buy one copy of their book (and a used one at that), but it does make it much harder for smaller companies to compete with megacorps for AI stuff, so it's basically the stupidest of the plausible outcomes.

    • These are two separate actions that Anthropic did:

      * They downloaded a massive online library of pirated books that someone else was distributing illegally. This was not fair use.

      * They then digitised a bunch of books that they physically owned copies of. This was fair use.

      This part of the ruling is pretty much existing law. If you have a physical book (or own a digital copy of a book), you can largely do what you like with it within the confines of your own home, including digitising it. But you are not allowed to distribute those digital copies to others, nor are you allowed to download other people's digital copies that you don't own the rights to.

      The interesting part of this ruling is that once Anthropic had a legal digital copy of the books, they could use it for training their AI models and then release the AI models. According to the judge, this counts as fair use (assuming the digital copies were legally sourced).

      9 replies →

  • > You skipped quotes about the other important side:

    He said:

    > It was always somewhat obvious that pirating a library would be copyright infringement.

    ??

  • From my understanding:

    > pirating the books for their digital library is not fair use.

    "Pirating" is a fuzzy word and has no real meaning. Specifically, I think this is the cruz:

    > without adding new copies, creating new works, or redistributing existing copies

    Essentially: downloading is fine, sharing/uploading up is not. Which makes sense. The assertion here is that Anthropic (from this line) did not distribute the files they downloaded.

    • The legal context here is that "format shifting" has not previously been held to be sufficient for fair use on its own, and downloading for personal use has also been considered infringing. Just look at the numerous media industry lawsuits against individuals that only mention downloading, not sharing for examples.

      It's a bit surprising that you can suddenly download copyrighted materials for personal use and and it's kosher as long as you don't share them with others.

      4 replies →

    • Downloading and using pirated software in a company is fine then as long as it is not shared outside? If what you describe is legal it makes no sense to pay for software.

      4 replies →

I don't think that's new. google set precedent for that more than a decade ago. you're allowed to transform a book to digital.

How times change .They wanted to lock up Aaron Schwartz for life for essentially doing the same thing Anthropic is doing.

  • Aaron Swartz wanted to provide the public with open access to paywalled journal articles, while Anthropic want to use other people's copyrighted material to train their own private models that they restrict access to via a paywall. It's wild (but unsurprising) that Aaron Swartz was prosecuted under the CFAA for this while Anthropic is allowed to become commercially successful

Im not sure how I feel about what anthropic did on merit as a matter of scale, but from a legalistic standpoint how is it different from using the book to train the meat model in my head? I could even learn bits by heart and quote them in context.

  • Not sure about the law, but if you memorize and quote bits of a book and fail to attribute them, you could be accused of plagiarism. If for example you were a journalist or researcher, this could have professional consequences. Anthropic is building tools to do the same at immense scale with no concept of what plagiarism or attribution even is, let alone any method to track sourcing--and they're still willing to sell these tools. So even if your meat model and the trained model do something similar, you have a notably different understanding of what you're doing. Responsibility might ultimately fall to the end user, but it seems like something is getting laundered here.

Is fruit of the poisonous tree rule applicable here?

  • That's only really applicable to evidence in criminal cases obtained by the government. No such doctrine exists for civil cases, for instance. It doesn't even bar the government from using evidence that others have collected illegally of their own volition.