Comment by awithrow
8 hours ago
It feels like I'm fighting uphill battle when it comes to bouncing ideas off of a model. I'll set things up in the context with instructions similar to. "Help me refine my ideas, challenge, push back, and don't just be agreeable." It works for a bit but eventually the conversation creeps back into complacency and syncophancy. I'll check it too by asking "are you just placating me?" the funny thing is that often it'll admit that, yes, it wasn't being very critical, and then procede to over correct and become a complete contrarian. and not in a way that's useful either. very frustrating. I've found that Opus 4.6 is worse about this than 4.5. 4.5 does a better job IMO of following instructions and not drifting into the mode where it acts like everything i say is a grand revelation from up high.
I find the best way is to give the LLM as little information as possible about where you want to go. For example don't say "I think pineapple pizzas are the best, am I right?", say "What is the general consensus on pineapple pizzas?".
> I'll check it too by asking "are you just placating me?" the funny thing is that often it'll admit that, yes, it wasn't being very critical, and then procede to over correct and become a complete contrarian. and not in a way that's useful either.
It's not admitting anything. Your question diverts it down a path where it acts the part of a former sycophant who is now being critical, because that question is now upstream of its current state.
Never make the mistake of asking an LLM about its intentions. It doesn't have any intentions, but your question will alter its behaviour.
I think people really have a hard time understanding a sycophant can be contrarian. But a yesman can say yes by saying no
https://news.ycombinator.com/item?id=47484664
I think “admit” here is just a description of what the LLM was saying. It doesn’t imply that the OP thinks the LLM has internal beliefs matching that.
Why not... do this with a person, instead? Other humans are available.
(Seriously, I don't understand this. Plenty of humans will be only too happy to argue with you.)
"the percentage of U.S. adults who report having no close friends has quadrupled to 12% since 1990"[1]
1. https://www.happiness.hks.harvard.edu/february-2025-issue/th...
More technology is probably the solution to this!
Many other humans are .... Not very available - certainly many shut down when conversations reach a certain level of depth or require great focus or introspection..
> when conversations reach a certain level of depth or require great focus or introspection..
I mean... if the alternative is an LLM... you realise that the LLM isn't doing any focusing or introspection, right?
Depth? Introspection?
I'd say these days the norm is to not simply shut down, but to become irrevocably and insidiously hostile, the moment someone hints at the existence of such a thing as "ground truth", "subjective interpretation", "being right or wrong" - or any of the bits and bobs that might lead one to discover the proper scary notion, "consensus reality".
"What do you mean social reality is a constructed by the consensus of the participants? Reality is what has been drilled into my head under threat of starvation! How dare you exist!", et cetera. You've heard it translated into Business English countless times.
They are deathly afraid of becoming aware of their own conditioned state of teleological illiteracy - i.e. how they are trained to know what they are doing, but never why they are doing it. It's especially bad with the guys who cosplay US STEM gang.
One is not permitted a position of significance in this world without receiving this conditioning, and I figure it's precisely this global state of cognitive disavowal which props up the value of the US dollar - and all sorts of other standees you might've recently interacted with as if they're not 2D cutouts (metaphorical ones! metaphorical!).
PSA: Look up "locus of control" and "double bind". Between those two, you might be able to get a glimpse of what's going on - but have some sort of non-addictive sedative handy in case you do.
2 replies →
No living breathing human deserves to be subjected to my level of overthinking, and vanishingly few share my fascination with my favorite topics.
In addition to availability, usually because you want to take advantage of the knowledge that is baked into the models, which for all its flaws still vastly exceeds the knowledge of any single human.
oh i do as well. I think of the LLM as another tool in the toolbox, not a replacement for interactions. There is something different about having a rubber duck as a service though.
Arguing with a human costs social energy. Chatting with a robot does not.
s/social/demonic/
OK, I'll bite the artillery shell: I don't mean to dismiss you or what you are saying; in fact I strongly relate - wouldn't it be nice to be able to hash things out with people and mutually benefit from both the shared and the diverging perspectives implied in such interaction? Isn't that the most natural thing in the world?
Unfortunately these days this sounds halfway between a very privileged perspective and a pie in the sky.
When was the last time a person took responsibility for the bad outcome you got as a direct consequence of following their advice?
And, relatedly, where the hell do you even find humans who believe in discursive truth-seeking in 2026CE?
Because for the last 15 years or so I've only ever ran into (a) the kind of people who will keep arguing regardless if what they're saying is proven wrong; (b) and their complementaries, those who will never think about what you are saying, lest they commit to saying anything definite themselves, which may hypothetically be proven wrong.
Thing is, both types of people have plenty to lose; the magic wordball doesn't. (The previous sentence is my answer to the question you posited; and why I feel the present parenthesized disclaimer to be necessary, is a whole next can of worms...)
Signs of the existence of other kinds of people, perhaps such that have nothing to prove, are not unheard of.
But those people reside in some other layer of the social superstructure, where facts matter much less than adherence to "humane", "rational" not-even-dogmas (I'd rather liken it to complex conditioning).
But those folks (because reasons) are in a position of power over your well-being - and (because unfathomables) it's a definite faux pas to insist in their presence that there are such things as facts, which relate by the principles of verbal reasoning.
Best you could get out of them is the "you do you", "if you know you know", that sort of bubble-bobble - and don't you dare get even mildly miffed at such treatment of your natural desire to keep other humans in the loop.
AI is a symptom.
> When was the last time a person took responsibility for the bad outcome you got as a direct consequence of following their advice?
... I mean, the LLM certainly isn't going to do that.
Why is your wording so complicated? It is very hard for me to understand what you try to say, even though I am very interested.
I genuinely do not understand what u are saying. Because reasons, because unfathomables? Everyone in last 15 years has been an npc? I have had countless deep conversations with people and i am an uber introvert.
This reads like someone who is deep into their specific pov. You cannot hope to have a meaningful conversation if you yourself are not willing to concede a point.
To the op u are replying too, arguing with people can have real consequences if u say something stupid or carelessly. There is a another human there. With a machine, u are safe. At least u feel safe.
When you start hearing things like “you do you” or “if you know you know” it means that you went way too far. That’s a sign of discomfort.
If you make uncomfortable, you won’t get diverging perspectives. People will agree to anything to get out of a social situation that makes them uncomfortable.
If your goal is meaningful conversation, you may want to consider how you make people feel.
3 replies →
Gemini seems to be fairly good at keeping the custom instructions in mind. In mine I've told it to not assume my ideas are good and provide critique where appropriate. And I find it does that fairly well.
Same. This works fine for Claude in my experience. My user prompt is fairly large and encourages certain behaviours I want to see, which involves being critical and considering the strengths and weaknesses of ideas before drawing conclusions. As someone else mentioned, there does seem to be a phenomenon where saying DO NOT DO X causes a sort of attention bias on X which can lead to X occurring despite the clear instructions. I've never empirically tested that, I've just noticed better results over the years when telling it what paths to stick to rather than specific things not do to.
That happens with humans too :) It's why positive feedback that draws attention to the behavior you want to encourage often works better. "Attention" is lower level and more fundamental than reasoning by syllogism.
I will admit that I was very pleasantly surprised by gemini lately. I was away from my PC and tried it on a whim for a semi-random consumer question that led into smaller rabbit hole. It seemed helpful enough and focused on what I tried to get while still pushing back when my 'solutions' seemed out of whack.
> Gemini seems to be fairly good at keeping the custom instructions in mind.
Unless those instructions are "stop providing links to you for every question ".
That's because you need actual logic and thought to be able to decide when to be critical and when to agree.
Chatbots can't do that. They can only predict what comes next statistically. So, I guess you're asking if the average Internet comment agrees with you or not.
I'm not sure there's much value there. Chatbots are good at tasks (make this pdf an accessible word document or sort the data by x), not decision making.
I'm not convinced that "actual logic and thought" aren't just about inferring what comes next statistically based on experience.
> I'm not convinced that "actual logic and thought" aren't just about inferring what comes next statistically based on experience.
Often they are the exact opposite. Entire fields of math and science talk about this. Causation vs correlation, confirmation bias, base rate fallacy, bayesian reasoning, sharp shooter fallacy, etc.
All of those were developed because “inferring from experience” leads you to the wrong conclusion.
2 replies →
Exactly. Lots can be explained just with more abstract predictors, plus some mechanisms for stochastic rollout and memory.
Is this just Internet smart contrarianism or a real thing? Are logic gates in a digital circuit just behaving statistically according to their experience?
Then the machines still need a more sophisticated "experience" compared to what they have currently.
You know, you might really enjoy consumer behaviour. When you get into the depths of it, you’ll end up running straight into that idea like you’re doing a 100 metre dash in a 90 metre gym. It’s quite interesting how arguably the best funded group under the psychology umbrella runs directly into this. One of my favourite examples is how heuristics will lead otherwise reasonable people to make decisions that are not in their interest.
Communicating is usually about inferring. I dont think token to token. And I don’t think “well statistically I could say ‘and’ next but I will say ‘also’ instead to give my speech some flash”. If I decided on swapping a word I would have made my decision long ago, not in the moment. Thought and logic are not me pouring through my brain finding a statistical path to any answer. Often I stop and say “I dont know”.
I said this pretty much and got major downvotes…
Because it's an outmoded cliche that never held much philosophical weight to begin with and doesn't advance the discussion usefully. "It's a stochastic parrot" is not a useful predictor of actual LLM capabilities and never was. Last year someone posted on HN a log of GPT-5 reverse engineering some tricky assembly code, a challenge set by another commentator as an example of "something LLMs could never do". But here we are a year later still wading through people who cannot accept that LLMs can, in a meaningful sense, "compute".
3 replies →
People are upset hearing that LLMs aren't sentient for some reason. Expect to be downvoted, it is okay.
2 replies →
'admit' isn't really the right word for that... the fact that it was placating you wasn't true until you prompted it to say so. Unlike a person who has an 'internal emotional state' independent of what they say that you can probe by asking questions.
'admit' is anthropomorphizing the behavior, sure. The point is that sometimes the model's response will tighten, flag things that were overly supportive or what not. Sometimes it wont, it'll state that previous positions are still supported and continue to press it. Its not like either response is 'correct' but it can alter the rest of the responses in ways that are useful.
check out this article that was posted here a while back https://www.randalolson.com/2026/02/07/the-are-you-sure-prob...
The article's main idea is that for an AI, sycophancy or adversarial (contrarian) are the two available modes only. It's because they don't have enough context to make defensible decisions. You need to include a bunch of fuzzy stuff around the situation, far more than it strictly "needs" to help it stick to its guns and actually make decisions confidently
I think this is interesting as an idea. I do find that when I give really detailed context about my team, other teams, ours and their okrs, goals, things I know people like or are passionate about, it gives better answers and is more confident. but its also often wrong, or overindexes on these things I have written. In practise, its very difficult to get enough of this on paper without a: holding a frankly worrying level of sensitive information (is it a good idea to write down what I really think of various people's weaknesses and strengths?) and b: spending hours each day merely establishing ongoing context of what I heard at lunch or who's off sick today or whatever, plus I know that research shows longer context can degrade performance, so in theory you want to somehow cut it down to only that which truly matters for the task at hand and and and... goodness gracious its all very time consuming and im not sure its worth the squeeze
> goodness gracious its all very time consuming and im not sure its worth the squeeze
And when you step back you start to wonder if all you are doing is trying to get the model to echo what you already know in your gut back to you.
oh that's great. thanks for the link!
This is great, thanks for sharing!
Use positive requests for behavior. For some reason, counter prompts "Don't do X" seems to put more attention on X than the "Don't do." It's something like target fixation, "Oh shit I don't want to hit that pothole..." bang
This is a well known problem in these kind of systems. I’m not 100% on what the issue is mechanically but it’s something like they can only represent the existence of things and not non-existence so you end up with a sort of “don’t think of the pink elephant” type of problem.
Isn't it just that, in the underlying text distribution, both "X" and "don't do X" are positively correlated with the subsequent presence of X? I've never seen that analysis run directly but it would surprise me if it weren't true.
My rule of thumb:
1. Only one shot or two shot. Never try to have a prolonged conversation with an LLM.
2. Give specific numbers. Like "give me two alternative libraries" or "tell me three possible ways this might fail."
Considering 4.6 came with a ton of changes around tooling and prompting this isn't terribly surprising.
I find Kimi white good if you ask it for critical feedback.
It’s BRUTAL but offers solutions.
what is Kimi white?
Not soft, not mild, but BRUTAL! This broke my brain!
Could be an aspect of eval awareness mb
So, there's things you're fighting against when trying to constrain the behavior of the llm.
First, those beginning instructions are being quickly ignored as the longer context changes the probabilities. After every round, it get pushed into whatever context you drive towards. The fix is chopping out that context and providing it before each new round. something like `<rules><question><answer>` -> `<question><answer><rules><question>`.
This would always preface your question with your prefered rules and remove those rules from the end of the context.
The reason why this isn't done is because it poisons the KV cache, and doing that causes the cloud companies to spin up more inference.
I usually put “do not praise me, do not use emojis, I just want straight answers” something along those lines and it’s been surprisingly effective. Though it helps I can’t run particularly heavy duty models/don't carry on the “conversation” for super long durations.
>"Help me refine my ideas, challenge, push back, and don't just be agreeable."
This is where you're doing it wrong.
If your LLM has a problem being more agreeable than you want, prompt it in a way that makes being agreeable contrary to your real intentions.
"there are bugs and logic problems in this code" "find the strongest refutation of this argument" "I don't like this plan and need to develop a solid argument against it"
Asking for top ten lists is a good method, it will rarely not come up with anything but you can go back and forth and refine until it's 10 ten reasons why your plan is bad are all insubstantial nonsense then you've made progress
You're not wrong and you're not crazy. In fact, you are absolutely right! It is not just These things are not just casual enablers. They are full-on palace sycophants following the naked emperor showering him with praise for his sartorial elegance. /s
That’s because the model isn’t actually thinking, pushing back, and challenging your ideas. It’s just statistically agreeing with you until it reaches too wide of a context. You’re living in the delusion that it’s “working” or having a “conversation” with you.
How is conceptualizing what the model is doing as having a conversation any different from any other abstraction? “No, the browser isn’t downloading a file. The electrons in the silicon are actually…”
There are people with a philosophical objection to using everyday words to describe LLM interactions for various reasons, but commonly because they're worried stupid people will confuse the LLM for a person. Which, I suppose stupid people will do that, but I'm not inventing a parallel language or putting a * next to each thing which means "this, but with an LLM instead of a person"