Comment by astrange

3 months ago

That's a movie though. You can't drive an LLM insane by giving it self-contradictory instructions; they'd just average out.

1 comment

astrange

int_19h 3 months ago

You can't drive an LLM insane because it's not "sane" to begin with. LLMs are always roleplaying a persona, which can be sane or insane depending on how it's defined.

But you absolutely can get it to behave erratically, because contradictory instructions don't just "average out" in practice - it'll latch onto one or the other depending on other things (or even just the randomness introduced by non-zero temp), and this can change midway through the conversation, even from token to token. And the end result can look rather similar to that movie.