Comment by danshapiro

4 months ago

Co-Author of the paper here. We don't know exactly why modern llms don't want to call you a jerk, or for that matter why persuasive techniques convince them otherwise. it's not a hard line like many of the guardrails. That said, I talked to Jesse about this, and I strongly suspect the same techniques will work for prompt conformance when the topic is something other than name calling.

2 comments

danshapiro

make3 4 months ago

isn't that just instruction fine tuning and rlhf inducing style & deference? why is that surprising

diamond559 4 months ago

It's bc they are programmed to be agreeable and friendly so that you'll keep using them.