Comment by astrange
3 months ago
That's a movie though. You can't drive an LLM insane by giving it self-contradictory instructions; they'd just average out.
3 months ago
That's a movie though. You can't drive an LLM insane by giving it self-contradictory instructions; they'd just average out.
You can't drive an LLM insane because it's not "sane" to begin with. LLMs are always roleplaying a persona, which can be sane or insane depending on how it's defined.
But you absolutely can get it to behave erratically, because contradictory instructions don't just "average out" in practice - it'll latch onto one or the other depending on other things (or even just the randomness introduced by non-zero temp), and this can change midway through the conversation, even from token to token. And the end result can look rather similar to that movie.