Comment by tcdent
3 days ago
This style of prompting, where you set up a dire scenario in order to try to evoke some "emotional" response from the agent, is already dated. At some point, putting words like IMPORTANT in all uppercase had some measurable impact, but at the present time, models just follow instructions.
Save yourself the experience of having to write and maintain prompts like this.
Also the persuasion paper he links isn't at all about what he's talking about.
That paper is about using persuasion prompts to overcome trained in "safety" refusals, not to improve prompt conformance.
Co-Author of the paper here. We don't know exactly why modern llms don't want to call you a jerk, or for that matter why persuasive techniques convince them otherwise. it's not a hard line like many of the guardrails. That said, I talked to Jesse about this, and I strongly suspect the same techniques will work for prompt conformance when the topic is something other than name calling.
It's bc they are programmed to be agreeable and friendly so that you'll keep using them.
isn't that just instruction fine tuning and rlhf inducing style & deference? why is that surprising
What’s irritating is that the llms haven’t learned this as bout themselves yet. If you ask an llm to improve its instructions those sort of improvements are what it will suggest.
It is the thing I find most irritating about working with llms and agents. They seem forever a generation behind in capabilities that are self referential.
LLMs will also happily put time estimates on work packages that are based on ore-LLM turn around times.
"Phase 2 will take about one week"
No, Claude, it won't, because you you and I will bang this thing out in a few hours.
"Refrain from including estimated task completion times." has been in my ~/.claude/CLAUDE.md for a while. It helps.
2 replies →
Comments like yours on posts like these by humans like us will create a philosophical lens out of the ether that future LLMs will harvest for free and then paywall.