← Back to context

Comment by muwtyhg

2 days ago

> The idea of "all the public works ever created" is easily contested.

Hence the word "public," implying that they are published and accessible.

> The internet now allows potentially anyone to publish anything, e.g., via personal websites, social media pages, etc. But that doesnt mean everyone partakes. How much of the unfiltered garbage published by those who do has been used to create these "models"

This seems like a nitpick instead of actually responding to the idea that they have stolen massive amounts of other peoples' work and are using it to enrich themselves. And the stealing is ignored or given a slap-on-the-wrist fine, which is not how it has worked for numerous other people in the past (the example being Aaron Schwartz). It's kind of irrelevant if the models do or do not train on low-effort text on the internet.