Comment by red75prime

1 day ago

Shredding is a machine transformation. Does it mean that shreds retain original copyright even if the content can't be restored and the provenance can't be traced? Just an example that treating all machine transformations equally with no regard to the specifics doesn't make much sense.

And the specifics of autoregressive pretraining is that it is lossy compression. Good luck finding which copyrighted materials have made it into the final weights.

> Does it mean that shreds retain original copyright even if the content can't be restored?

Yup, it absolutely does. In fact, that's why you are still violating copyright law by using bittorrent even though each of the users is only giving out a small slice or shred of the original content.

The US has a granted defense in the case of something like shredding called "Fair Use" but that doesn't mean or imply that a copyright is void simply because of a fair use claim.

> And the specifics of autoregressive pretraining is that it is lossy compression.

That doesn't matter. Why would it? If I take a FLAC recording and change it to an MP3. The fact that it was a lossy transform doesn't suddenly give me the legal right to distribute the MP3.

> Good luck finding which copyrighted materials have made it into the final weights.

That's what the NYT v. OpenAI lawsuit is all about. And for earlier models they could, in fact, pull out full NYT articles which proved they made it into the final weights.

Further, the NYT is currently in discovery which means OpenAI must open up to the NYT what goes into their weights. A move that, if OpenAI loses, other litigants can also use because there's a real good shot that OpenAI also included their works in the dataset.

  • > Yup, it absolutely does

    Well, it's not the first time when the law contradicts laws of nature (for the entertainment of the future generations). Bittorent is not a relevant example, because the system is designed to restore the work in its fullness.

    > in fact, pull out full NYT articles

    That's when they used their knowledge of the exact text they wanted to "retrieve" to get the text? It wouldn't be so efficient with a random number generator, but it's doable.

    • > Bittorent is not a relevant example, because the system is designed to restore the work in its fullness.

      You can restore shredded documents with enough time and effort. And if you did that and started making photo copies, even if they are incomplete, you will run afoul of copyright law.

      Bittorrent is a relevant example because it shows that shredding doesn't destroy copyright.

      Remember, copyright is about the right to copy something. Simply shredding or destroying a thing isn't applicable to copyright. Nor is giving that thing away. What's applicable is when you start to actually copy the thing.

      4 replies →