← Back to context

Comment by danshapiro

3 days ago

Co-Author of the paper here. We don't know exactly why modern llms don't want to call you a jerk, or for that matter why persuasive techniques convince them otherwise. it's not a hard line like many of the guardrails. That said, I talked to Jesse about this, and I strongly suspect the same techniques will work for prompt conformance when the topic is something other than name calling.

isn't that just instruction fine tuning and rlhf inducing style & deference? why is that surprising