← Back to context

Comment by 112233

19 hours ago

It it actively dangerous too. You might be self aware and llm aware all you want, if you routinely read "This is such an excellent point", " You are absolutely right" and so on, it does your mind in. This is worst kind of global reality show mkultra...

Deepseek is GOATed for me because of this. If I ask it if "X" is a dumb idea, it is very polite in telling me that X is is dumb if the AI knows of a better way to do the task.

Every other AI I've tried is a real sycophant.

  • I'm partial to the tone of Kimi K2 — terse, blunt, sometimes even dismissive. Does not require "advanced techiques" to avoid the psychosis-inducing tone of Claude/ChatGPT

It might explain why there is a stereotype the more beautiful woman the crazier she is. (everybody tells her what she wants to hear)

So this is what it feels to be a billionaire with all the yes men around you.

  • you say that like it's a bad thing! Now everyone can feel like a billionaire!

    but I think you are on to something here with the origin of the sycophancy given that most of these models are owned by billionaires.

No doubt. From cult's 'love bombing' to dictator's 'yes men' to celebrity entourages, it's a well-known hack on human psychology. I have a long-time friend who's a brilliant software engineer who recently realized conversing with LLMs was affecting his objectivity.

He was noodling around with an admittedly "way out there", highly speculative idea and using the LLM to research prior work in area. This evolved into the LLM giving him direct feedback. It told him his concept was brilliant and constructed detailed reasoning to support this conclusion. Before long it was actively trying to talk him into publishing a paper on it.

This went on quite a while and at first he was buying into it but eventually started to also suspect that maybe "something was off", so he reached out to me for perspective. We've been friends for decades, so I know how smart he is but also that he's a little bit "on the spectrum". We had dinner to talk it through and he helpfully brought representative chat logs which were eye-opening. It turned into a long dinner. Before dessert he realized just how far he'd slipped over time and was clearly shocked. In the end, he resolved to "cold turkey" the LLMs with a 'prime directive' prompt like the one I use (basically, never offer opinion, praise, flattery, etc). Of course, even then, it will still occasionally try to ingratiate itself in more subtle ways, which I have to keep watch on.

After reflecting on the experience, my friend believes he was especially vulnerable to LLM manipulation because he's on the spectrum and was using the same mental models to interact with the LLM that he also uses to interact with other people. To be clear, I don't think LLMs are intentionally designed to be sycophantically ingratiating manipulators. I think it's just an inevitable consequence of RLHF.

  • And that is a relatively harmless academic pursuit. What about topics that can lead to true danger and violence?

    "You're exactly right, you organized and paid for the date, that created a social debt and she failed to meet her obligation in that implicit deal."

    "You're exactly right, no one can understand your suffering, nothingness would be preferable to that."

    "You're exactly right, that politician is a danger to both the country and the whole world, someone stopping him would become a hero."

    We have already seen how personalized content algorithms that only prioritize getting the user to continue to use the system can foment extremism. It will be incredibly dangerous if we follow down that path with AI.

  • Claude Code with their models is unusable because of this. That it keeps actively sabotaging and ruining the code ("Why did you delete that working code? Just use ifdef for test!" "This is genius idea! You are absolutely right!") does not make it much better — it's a twisted Wonderland fever dream.

    For "chat" chat, strict hygiene is a matter of mind-safety: no memory, long exact instructions, minimum follow-ups, avoiding first and second person if possible etc.