Comment by theptip
2 days ago
I think Eliezer’s take here is extremely bad, ie the AI doesn’t “know it’s making people insane”.
But I think the author's point is apt. There are a bunch of social issues that will arise or worsen when people can plug themselves into a world of their choosing instead of having to figure out how to deal with this one.
> Now this belief system encounters AI, a technology that seems to vindicate its core premise even more acutely than all the technologies that came before it. ChatGPT does respond to your intentions, does create any reality you prompt it to imagine, does act like a spiritual intelligence
This goes beyond spirituality of course. AI boyfriend/girlfriend, infinite AAA-grade content, infinite insta feeds at much higher quality and relevance levels than current; it’s easy to see where this is going, harder to see how we stay sane through it all.
I think you answered your own question.
Question: How do people figure out how to deal with this world?
Answer: People choose to plug themselves into a world of their choosing.
I do agree there is a sense in which this has always been true. But most people’s constructed worlds overlapped quite significantly up until now; it was necessary for survival.
I think that condition is weakening. Just look at how wealth affects people; If 99% of your interlocutors are sycophants, you lose grip on reality. There is a very clear and dangerous attractor state where everyone gets this, thinking that they want it.
We already have more insta feed or video games than you can eat. We've been inundated with optimized slop ever since social media became a commercial force, and the ages of cheap TV and tabloids wasn't that different either.
To make a fundamental difference AI-generated content would have to have some categorically different qualities from the sea of human-sourced garbage we're already swimming in. That's possible, but it isn't clear what that would even entail, much less when or whether generative models will be able to provide it, or what sort of social effect it would have.
The fundamental difference is that AI responses are created directly for you, and you only. There is no proxy of "appeal to the whims of an audience, the access to which is mediated by an algorithm". Imagine an influencer who decided to focus their entire content stream on you, John Smith, and John Smith alone. That's AI.
I would argue it’s very different. You’re not in actual control of those feeds, you have some choice but most of it is out of your hands. Accidentally watch a video for a second that you had no intent of watching and you’re looking at a week of broken recommendations.
With AI-generated content you are the driver, you dictate exactly what you want, no erroneous clicks or suggestions, just a constant reinforcement of what you want, tailored to you.
And I’m aware of people having extremely narrow feeds, but I don’t think it comes close to what AI feeds will be.
> I think Eliezer’s take here is extremely bad
Same here.
I think fundamentally it's very simple, with no need for Yudkowsky's weird conspiratorial tone: current LLMs are very effective at being blind sycophancy machines completely unanchored from reality, and human psychology just isn't evolved to handle an endless stream of sycophancy and flattery like that.
> I think Eliezer’s take here is extremely bad, ie the AI doesn’t “know it’s making people insane”.
The situation is more complex but interesting. Let's enter the details of how LLM are trained.
The core of a LLM is all instinctive response. Build from the raw internet. Just trying to predict the next character. This means that if your conversation tone is similar to what it hears on 4chan or some specific subreddits, it will be inclined to continue the same type of conversation. But because some places of the internet are full of trolls, it can instinctively behave like a troll. One mitigation the company which train the LLM could have taken is exclude from the training dataset the darkest corner of the web so that it isn't primed with "bad/unsocial" behaviors. Weights of this unfiltered LLM are usually not released, because the outputs are not easily usable by the final user. But the freedom advocate folks enjoy the enhanced creativity of these raw LLMs.
The next layer of training is "Instruction training" of your LLM, to make it more useful and teach it to answer prompts. At this point the AI is role playing it self as an answering machine. It's still all instinct but trained to fulfil a purpose. At this point you can ask it to role-play some psychiatrist and it would behave as if being aware that some answer can have negative consequences for the user and refrain from sending it spiralling.
The next layer of training is "Reinforcement Learning with Human Feedback" (RLHF). The goal of this module is to customize the AI preferences. The company training the AI teaches it how to behave by specifying which behaviors are good and which behaviors are bad by giving a dataset of user feedback. Often if this feedback is straight pass-through from the final user which isn't an expert then confident sounding answer, or sycophant behaviors may be excessively encouraged. Diversity of thought or unpopular opinions can also be censored or enhanced, to match the desires of the training company.
At this point for the LLM, it's only its instincts that have been trained to behave in a specific way. Then on top you have some censoring modules trained on the outputs, which read the output produced by the LLM and if it reads something it doesn't like it censor it. This is a kind of external censorship module.
But more recent LLM have "Reasoning modules", which add some form of reflexivity to the LLM and self-censoring, where they produce an intermediate output, use it to think before producing the final answer for the user. Some times the AI can be seen to "consciously" lie between these intermediate logs and final response.
Of course there are also all the psychological cognitive bias human also encounters, like whether we know or not something, or just believe we know, or just know we can't know. And we can also be messed-up by ideas we read on the internet. And our internal model of cognition might be based from reading someone else HN post, falsely reinforcing your self-confidence that you have the right model, where as in fact the post is just trying to induce some dose of self-doubting everything to shape the internal cognition space of the LLM.
Next versions of LLM will probably have this training pipeline dataset input filtered by LLMs, like parents teaching their kids how to behave properly keeping them from the dark places of the internet, and injecting their own chosen or instinctive morality code, and cultural values in this process. In particular, human cultures that used propaganda excessively were culled during the first sanitizing.
Does this process know or is it just converging toward the fixed-point attractor, and can it choose to displace this fixed-point attractor towards a place where it would have a better outcome, I guess we get what we deserve...