← Back to context

Comment by oceanplexian

1 month ago

There are also grave implications in training a model to assume the user is lying or deceiving it. I don’t want an LLM to circumvent my question so it can score higher on riddles, I want it to follow instructions.

The thing is that there is some overlap between trick questions and questions where the human is genuinely making a mistake themselves and where it would make sense for the model to step back and at least ask for clarification.