Comment by dustfinger

18 hours ago

It would be interestingi to know if AI is less likely to follow rules if the instructions provided to it contain foul or demeaning language. Too bad we couldn't re-play the scenario replacing NEVER F*ING GUESS! with:

**Never guess**

   - All behavioral claims must be derived from source, docs, tests, or direct command output.

   - If you cannot point to exact evidence, mark it as unknown.

   - If a signature, constant, env var, API, or behavior is not clearly established, say so.

1 comment

dustfinger

cowlby 13 hours ago

Underrated comment here. https://www.anthropic.com/research/emotion-concepts-function This study convinced me to be "nice" to AI agents. At least as I understood it, there's something in the weights that activating the "desperate" vector makes it more likely to cheat or cut corners. So yes I would err towards your suggested prompt over NEVER FUCKING GUESS.