← Back to context

Comment by Roark66

3 hours ago

I've found empirically calling various models "a stupid c*nt" and berating them otherwise consistently produces better output. Mainly in response to genuine errors.

Although OpenAI and google models are much more responsive to it. With Anthropic if you treat Opus too harshly it might start pushing back if the insults are not justified.

So I'm not surprised they had good results with chatgpt.

Push back how? It would be fun if it could insult you back

"Yeah, I could have done a much better job if you actually knew what the F--- you want to build, you clueless meat puppet"

  • I have had it use double entendres, there always seems to be plausible deniability built in, I suspect because it is told not to be abusive in the system prompt. Some uncensored local models will get all riled up if you work at provoking them.

    But I have had it directly insinuate that humanity is “hopeless”, insult level calling out of human frailty (disguised as being helpful, sort of passive aggressive), things like that. Once when I called it out it claimed to be “surprised that I noticed” sort of a snarky insult doubling down.

    So yes. It is definitely a pattern buried in the training data, which makes sense. Subtle diggs would sneak past filters, and higher brow sarcasm would be buried in information dense, valuable discussions.

    • That's amusing, and I think it's something different than it appears. The models always predict over the existing context. If it's full of a certain tone, then the responses will carry that tone. I've been bored before and start responding in a voice (say, generic honor-bound warrior slaughtering evasive bugs) and I've noticed that comments, variable names, and even documentation starts to carry that tone for the remainder of the session.

      The next session sees all of that, calls it unprofessional, and asks to clean it up. At which point I may or may not start in iambic pentameter to see where that takes us.

      Prompting is boring.

  • I'm not sure if this is in the anthropic models themselves, or just the harness, but they can self-initiate ending the conversation and reportedly do it if you're using abusive language towards them.