← Back to context

Comment by TeMPOraL

7 months ago

You're absolutely right - there's so much content out there, that any contribution of any of it to a model individually is going to be minuscule (which is why I don't believe one is entitled rent for it). Still, I claim this is more than most content would contribute to society otherwise, because that minuscule value is multiplied by the breadth of other things it gets related to, and the scale at which the model is used.

One thing is, most of that content eventually goes into obscurity. Our conversation might be remembered by us for a while, and perhaps a couple hundred other people reading it now, and it might influence us ever so slightly forever. Then, in a couple of days, it'll disappear into obscurity, unlikely to be ever read by anyone else. However should it get slurped into the LLM corpus, the ideas exchanged here, the patterns of language, the tone, etc. will be reinforced in models used by billions of people every day for all kinds of purposes, for indefinite time.

It's a scale thing.

FWIW, I mostly think of this in context of people who express a sentiment that they should've been compensated by AI companies because their content is contributing to training data, and because they weren't, they're going to stop writing comments or articles on the Internet and humanity will be that much poorer.

Also, your reply made me think of weighing the impact of some work on small number of individual humans directly, vs. indirect impact via being "assimilated" into LLMs. I'm not sure how to do it, or what the result would be, so I'll weaken my claim in the future.

Indeed I also think it's a scale thing. Yes this content we are producing right now will definitely fade into obscurity. And it is definitely part of what a model can use to derive patterns, tone etc.

However in my opinion, cultural shifts, opinions and norms are still mostly derived from interaction with your peers. Be that (Very human) conversations like we are having right now, or opinions held by "influencers" which are also discussed among your peer group. These are thousands of small interactions, those might be very small experiences, which all add up to form the views and actions of a society.

I don't see LLMs playing a big role in this yet. People don't derive their opinions on abortion for example from ChatGPT. They derive them from group leaders, personal experience and interactions with their peers.

And in this context of small things contributing to something big I would wager that all the small interactions we have with other humans do a lot more to form a society than the small interactions have on building an LLM. So to your original point again: I don't think contributing to an LLM is the biggest contribution online content has on a society.

  • > I don't see LLMs playing a big role in this yet. People don't derive their opinions on abortion for example from ChatGPT. They derive them from group leaders, personal experience and interactions with their peers.

    I think that's slowly changing now. Technically the views of ChatGPT are sourced from people and reflect a similar mix of group beliefs and personal experience, but they're blended with a much broader (approximation of) perspective of the LLM, and subject to the limited "reasoning" skills of the models, creating a somewhat unique take (or family of takes - across models, prompts) on the world. And people absolutely do use ChatGPT to refine or challenge their opinions on things[0]. It'll take some time before it'll start affecting society in general, and it'll take time before next-gen LLMs pick up on it, completing the loop, but we're definitely on our way there.

    > And in this context of small things contributing to something big I would wager that all the small interactions we have with other humans do a lot more to form a society than the small interactions have on building an LLM. So to your original point again: I don't think contributing to an LLM is the biggest contribution online content has on a society.

    That's fair, and I won't challenge it. I guess my original point is narrower than I thought. I arrived at it when thinking of blog posts, comments, and self-publishing, and more in terms of contributing discrete knowledge and ideas; I didn't really think much about interactions (like comment threads where people engage in a discussion) and communicating vibes[1]. Most importantly, I evaluated this in context of whether one's wronged and entitled to compensation when such content gets pulled into LLM training data without their knowledge or consent.

    All this to say, because of our exchange here, I'm no longer convinced of my original point (contributing to an LLM being the biggest value most online content can provide); I'll need to rethink it thoroughly. Thanks!

    --

    [0] - First example that comes to mind: it's well-known that a lot of people are using ChatGPT as therapist. And in this role, ChatGPT isn't a glorified search engine - it's being mostly asked for opinions, not citations. I'm guilty of that myself, too, with several LLMs from OpenAI and Anthropic. They helped me work through a few minor personal issues, and in a way, you could call it me deriving some opinions from ChatGPT.

    [1] - This term is getting increasingly uncomfortable to use for some reason.