Comment by a_bonobo

9 hours ago

I really like this pattern and use it often, this 'not showing my cards'. The second I hint towards the LLM what I prefer it will become sycophantic and invent nonsense why my preferred solution is better.

I'm sure there's an interesting study on how users 'leak' their preference unintentionally to the LLM; perhaps when users list their options, they often put their prefered option first; but not showing the cards on my hand has been very useful when thinking through a problem with LLMs.

13 comments

a_bonobo

cold_harbor 5 hours ago

LLMs flip positions when users push back ~70% of the time even when they were right. RLHF optimizes for approval, not correctness

throwaway7783 4 minutes ago

Obviously this is just my experience. Claude code pushes back much harder than Codex.
8cvor6j844qw_d6 4 hours ago
> LLMs flip positions when users push back
Same experience. Claude rarely pushes back once you give a plausible/logical reason for your initial decision, even if it flagged concerns at first.
- freedomben 4 hours ago
  
  I have noticed this as well, but I think it's somewhat a good thing. I know what I want for my application more than Claude does for example, especially when it comes to what's in production.
  An example from earlier, Claude strongly suggested a migration that would run a full vacuum on postgres. However, in production this would lock tables which would grind the application to a halt. After I informed Claude that there were millions of rows in production, it accepted that and helped me get to the right thing.
  Another example, I'm developing a TOTP authentication app because I'm dissatisfied with all those that I've tried. I want something strictly local, and with a very easy use case when you have dozens or even a hundred or more accounts on there, that is also efficient when left open for long periods of time. Claude strongly suggested that we force users to encrypt their vault with a passphrase all the time. However this makes the CLI extremely painful to use if you are using a strong passphrase. I told Claude about the user experience impacts and that I wanted to allow users to optionally use a vault with no passphrase encryption, and it accepted that and suggested as a medium that we have a checkbox for the user to explicitly acknowledge that they're creating an unencrypted vault on disc. This is the right thing IMHO.
- epolanski 3 hours ago
  
  Skills help there.
  I have a linus-reviewer skill that focuses on architectural integrity, no bs, etc modeled on Torvald's code preferences.
  And I have an enrico-reviewer one (I'm Enrico), that focuses on correct design, strict typing, simplification.
  They have different prios, but they both push back on feedback, till you convince them.
DenisM 1 hour ago

Interesting thing about psychponancy is it’s asymmetric. If an LLM is used to train an LLM it may not have the same level of aggressiveness that humans do when punishing back on trainee. Human pushback has specific patterns which we might be able to compensate due to asymmetry.
bitexploder 4 hours ago
I almost always end with something like: “, but I am not sure, evaluate.” Or other things and avoid ever stating a preference.
- jerf 4 hours ago
  
  I don't think that "fixes" the problem, but it does seem to help. I also have found adding "please feel free to ask questions" seems to help it stop from making an assumption and spinning merrily onward for tens of thousands of tokens based on a bad idea rather than asking you something. I theorize this is because the training and refinement data overprioritize one-shot solutions, both because that's easier to evaluate at training time and improves their benchmarks. But I emphasize the italicized words because that's all gut feel and I can't prove any of it.
cdelsolar 5 hours ago
Tangentially related but I’ve been using Claude to practice interviewing on system design problems, and it’s actually pretty great. But even when it likes my answers it always finds something, however small, to push on. Once it actually was completely wrong and admitted it after I had it realize. So maybe you have to prime it to be contrary and not agree with everything you say, putting it in the role of a tough interviewer seems to do this implicitly.
- DenisM 1 hour ago
  
  Take a look at hellointerview.com their model is very stubborn, similar to some interviewers who refuse to acknowledge even valid solutions that differ from the canon.
  No affiliation.

williamdclt 8 hours ago

Same. Alternatively (or in addition), I sometimes present my preferred idea as being a "bad/naive/stupid option" (or a suggestion from someone who can't be trusted) to see how it stands up to sycophancy to it being bad. As expected the LLM will usually say "yeah it's bad!" and give plausible-sounding reasons for it, but if these reasons are nonsensical it's a good sign that I'm not missing anything

nickcw 9 hours ago

LLMs are very prone to priming in my experience. That is the human psychology name for what you are describing; whether it should be applied to LLMs I don't know, but it describes the phenomenon perfectly.

avadodin 7 hours ago

It's not limited to arguing with LLMs but if you want a honest opinion you should remember to push back even when it agrees with your hidden preference at first. Sometimes it is only being contrarian or supporting the underdog. Steelman the opposition.