Comment by rowanG077

18 hours ago

Humans also don't follow given rules. Or we wouldn't need jail. We wouldn't need any security. We wouldn't need even user accounts.

Humans are able to follow rules. If you tell someone "don't press the History Eraser Button", and they decide they agree with the rule, they won't press the button unless by accident. If they really believe in the importance of the rule, they will take measures to stop themselves from accidentally press it, and if they really believe in the importance, they'll take measures to stop anyone from pressing it at all.

No matter how you insist to an LLM not to press the History Eraser Button, the mere fact that it's been mentioned raises the probability that it will press it.

I don’t mean that in a small way (ie sometimes they don’t follow rules), I mean it in the more important sense that they don’t have a sense of right or wrong and the instructions we give them are just more context, they are not hard constraints as most humans would see them.

This leads to endless frustration as people try to use text to constrain what LLMs generate, it’s fundamentally not going to work because of how they function.