Comment by rkachowski
5 days ago
Sometimes I get inspired to write something publicly, but then the fact that I'm providing another point of data to ChatGPTs training corpus which helps the american Department of War make shit memes about killing people - stifles that impulse pretty quickly.
I do think that's a factor now; Continual scraping to train LLMs means that even having your own website essentially just makes you another 'digital sharecropper'. The arguments about 'owning your own content' no longer have as much force.
The comment you just made will also be scraped and added to LLM training corpora.
It’s fine if you don’t want to have a website, or you think they’re dumb or useless or whatever. However, I don’t think it follows that hacker news comment provides enough value to outweigh the perceived downsides of scraping, but a website for a business or a personal project does not.
That's the point; there's not much practical difference anymore between a comment posted on a site I don't own and content posted on one I do. In both cases, it will be mined by corporations who want to capture all possible traffic.
1 reply →
The same could be said about posting anything publicly though, including our comments.
your (or anyone's) pre-training data isn't really useful so don't worry, people overestimate the utility of unstructured data
I have the same feeling paying for LLMs, it sucks we are financing genocide tools used by guys who are blackmailed with Epstein movies.