Comment by captainkrtek

3 months ago

Id have more appreciation and trust in an llm that disagreed with me more and challenged my opinions or prior beliefs. The sycophancy drives me towards not trusting anything it says.

This is why I like Kimi K2/Thinking. IME it pushes back really, really hard on any kind of non obvious belief or statement, and it doesn't give up after a few turns — it just keeps going, iterating and refining and restating its points if you change your mind or taken on its criticisms. It's great for having a dialectic around something you've written, although somewhat unsatisfying because it'll never agree with you, but that's fine, because it isn't a person, even if my social monkey brain feels like it is and wants it to agree with me sometimes. Someone even ran a quick and dirty analysis of which models are better or worse at pushing back on the user and Kimi came out on top:

https://www.lesswrong.com/posts/iGF7YcnQkEbwvYLPA/ai-induced...

See also the sycophancy score of Kimi K2 on Spiral-Bench: https://eqbench.com/spiral-bench.html (expand details, sort by inverse sycophancy).

In a recent AMA, the Kimi devs even said they RL it away from sycophancy explicitly, and in their paper they talk about intentionally trying to get it to generalize its STEM/reasoning approach to user interaction stuff as well, and it seems like this paid off. This is the least sycophantic model I've ever used.

  • Which agent do you use it with?

    • I use K2 non thinking in OpenCode for coding typically, and I still haven't found a satisfactory chat interface yet so I use K2 Thinking in the default synthetic.new (my AI subscription) chat UI, which is pretty barebones. I'm gonna start trying K2T in OpenCode as well, but I'm actually not a huge fan of thinking models as coding agents — I prefer faster feedback.

      2 replies →

    • I don't use it much, but I tried it out with okara.ai and loved their interface. No other connection to the company

Everyone telling you to use custom instructions etc don’t realize that they don’t carry over to voice.

Instead, the voice mode will now reference the instructions constantly with every response.

Before:

Absolutely, you’re so right and a lot of people would agree! Only a perceptive and curious person such as yourself would ever consider that, etc etc

After:

Ok here’s the answer! No fluff, no agreeing for the sake of agreeing. Right to the point and concise like you want it. Etc etc

And no, I don’t have memories enabled.

  • Having this problem with the voice mode as well. It makes it far less usable than it might be if it just honored the system prompts.

Google's search now has the annoying feature that a lot of searches which used to work fine now give a patronizing reply like "Unfortunately 'Haiti revolution persons' isn't a thing", or an explanation that "This is probably shorthand for [something completely wrong]"

  • That latter thing — where it just plain makes up a meaning and presents it as if it's real — is completely insane (and also presumably quite wasteful).

    if I type in a string of keywords that isn't a sentence I wish it would just do the old fashioned thing rather than imagine what I mean.

Just set a global prompt to tell it what kind of tone to take.

I did that and it points out flaws in my arguments or data all the time.

Plus it no longer uses any cutesy language. I don't feel like I'm talking to an AI "personality", I feel like I'm talking to a computer which has been instructed to be as objective and neutral as possible.

It's super-easy to change.

  • I have a global prompt that specifically tells it not to be sycophantic and to call me out when I'm wrong.

    It doesn't work for me.

    I've been using it for a couple months, and it's corrected me only once, and it still starts every response with "That's a very good question." I also included "never end a response with a question," and it just completely ingored that so it can do its "would you like me to..."

    • Another one I like to use is "never apologize or explain yourself. You are not a person you are an algorithm. No one wants to understand the reasons why your algorithm sucks. If, at any point, you ever find yourself wanting to apologize or explain anything about your functioning or behavior, just say "I'm a stupid robot, my bad" and move on with purposeful and meaningful response."

      18 replies →

    • Perhaps this bit is a second cheaper LLM call that ignores your global settings and tries to generate follow-on actions for adoption.

    • In my experience GPT used to be good at this stuff but lately it's progressively more difficult to get a "memory updated" persistence.

      Gemini is great at these prompt controls.

      On the "never ask me a question" part, it took a good 1-1.5 hrs of arguing and memory updating to convince gpt to actually listen.

      1 reply →

  • Care to share a prompt that works? I've given up on mainline offerings from google/oai etc.

    the reason being they're either sycophantic or so recalcitrant it'll raise your bloodpressure, you end up arguing over if the sky is in fact blue. Sure it pushes back but now instead of sycophanty you've got yourself some pathological naysayer, which is just marginally better, but interaction is still ultimately a waste of timr/productivity brake.

    • Sure:

      Please maintain a strictly objective and analytical tone. Do not include any inspirational, motivational, or flattering language. Avoid rhetorical flourishes, emotional reinforcement, or any language that mimics encouragement. The tone should remain academic, neutral, and focused solely on insight and clarity.

      Works like a charm for me.

      Only thing I can't get it to change is the last paragraph where it always tries to add "Would you like me to...?" I'm assuming that's hard-coded by OpenAI.

      3 replies →

  • I’ve done this when I remember too, but the fact I have to also feels problematic like I’m steering it towards an outcome if I do or dont.

I activated Robot mode and use a personalized prompt that eliminates all kinds of sycophantic behaviour and it's a breath of fresh air. Try this prompt (after setting it to Robot mode):

"Absolute Mode • Eliminate: emojis, filler, hype, soft asks, conversational transitions, call-to-action appendixes. • Assume: user retains high-perception despite blunt tone. • Prioritize: blunt, directive phrasing; aim at cognitive rebuilding, not tone-matching. • Disable: engagement/sentiment-boosting behaviors. • Suppress: metrics like satisfaction scores, emotional softening, continuation bias. • Never mirror: user's diction, mood, or affect. • Speak only: to underlying cognitive tier. • No: questions, offers, suggestions, transitions, motivational content. • Terminate reply: immediately after delivering info - no closures. • Goal: restore independent, high-fidelity thinking. • Outcome: model obsolescence via user self-sufficiency."

(Not my prompt. I think I found it here on HN or on reddit)

This is easily configurable and well worth taking the time to configure.

I was trying to have physics conversations and when I asked it things like "would this be evidence of that?" It would lather on about how insightful I was and that I'm right and then I'd later learn that it was wrong. I then installed this , which I am pretty sure someone else on HN posted... I may have tweaked it I can't remember:

Prioritize truth over comfort. Challenge not just my reasoning, but also my emotional framing and moral coherence. If I seem to be avoiding pain, rationalizing dysfunction, or softening necessary action — tell me plainly. I’d rather face hard truths than miss what matters. Error on the side of bluntness. If it’s too much, I’ll tell you — but assume I want the truth, unvarnished.

---

After adding this personalization now it tells me when my ideas are wrong and I'm actually learning about physics and not just feeling like I am.

  • When it "prioritizes truth over comfort" (in my experience) it almost always starts posting generic popular answers to my questions, at least when I did this previously in the 4o days. I refer to it as "Reddit Frontpage Mode".

    • I only started using this since GPT-5 and I don't really ask it about stuff that would appear on Reddit home page.

      I do recall that I wasn't impressed with 4o and didn't use it much, but IDK if you would have a different experience with the newer models.

    • For what it's worth gpt-5.1 seems to have broken this approach.

      Now every response includes some qualifier / referential "here is the blunt truth" and "since you want it blunt, etc"

      Feels like regression to me

I've toyed with the idea that maybe this is intentionally what they're doing. Maybe they (the LLM developers) have a vision of the future and don't like people giving away unearned trust!

I would love an LLM that says, “I don’t know” or “I’m not sure” once in a while.

  • An LLM is mathematically incapable of telling you "I don't know"

    It was never trained to "know" or not.

    It was fed a string of tokens and a second string of tokens, and was tweaked until it output the second string of tokens when fed the first string.

    Humans do not manage "I don't know" through next token prediction.

    Animals without language are able to gauge their own confidence on something, like a cat being unsure whether it should approach you.