Comment by userbinator

9 hours ago

Full book content and model generations are not included because the books are copyrighted and the generations contain large portions of verbatim text.

There are plenty of old books in the public domain already... but I'm not sure what exactly this exercise is supposed to show, since the Kolmogorov limit still stands in the way of "infinite compression".

4 comments

userbinator

namenotrequired 8 hours ago

> There are plenty of old books in the public domain already

Yes but showing that it happens in books in the public domain does nothing to prove that it happens for copyrighted books

userbinator 8 hours ago
"Same difference," as the saying goes. If their claims are true then you can make the model recite "lorem ipsum" or anything else that's long and has nonzero entropy.
- namenotrequired 7 hours ago
  
  It’s not the same. Presumably public domain works are much more frequently shared on the public internet and therefore much more common in the training set
- crote 6 hours ago
  
  The difference is that one of them is completely fine, and the other is a crime.