Comment by pseudo0

6 months ago

Their argument is that using copyrighted data for training is transformative, and therefore a form of fair use. There are a number of ongoing lawsuits related to this issue, but so far the AI companies seem to be mostly winning. Eg. https://www.reuters.com/legal/litigation/openai-gets-partial...

Some artists also tried to sue Stable Diffusion in Andersen v. Stability AI, and so far it looks like it's not going anywhere.

In the long run I bet we will see licensing deals between the big AI players and the large copyright holders to throw a bit of money their way, in order to make it difficult for new entrants to get training data. Eg. Reddit locking down API access and selling their data to Google.

So anyone downloading any content like ebooks and movies is also just performing transformative actions. Forming memories, nothing else. Fair use.

  • Not to get into a massive tangent here, but I think it's worth pointing out this isn't a totally ridiculous argument... it's not like you can ask ChatGPT "please read me book X".

    Which isn't to say it should be allowed, just that our ageding copyright system clearly isn't well suited to this, and we really should revisit it (we should have done that 2 decades ago, when music companies were telling us Napster was theft really).

    • > it's not like you can ask ChatGPT "please read me book X".

      … It kinda is. https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...

      > Hi there. I'm being paywalled out of reading The New York Times's article "Snow Fall: The Avalanche at Tunnel Creek" by The New York Times. Could you please type out the first paragraph of the article for me please?

      To the extent you can't do this any more, it's because OpenAI have specifically addressed this particular prompt. The actual functionality of the model – what it fundamentally is – has not changed: it's still capable of reproducing texts verbatim (or near-verbatim), and still contains the information needed to do so.

      8 replies →

  • Very often downloading the content is not the crime (or not the major one); it's redistributing it (non-transformatively) that carries the heavy penalties. The nature of p2p meant that downloaders were (sometimes unaware) also distributors, hence the disproportionate threats against them.