Comment by Jensson
10 days ago
Of course there is, you just train it on questions where you know the answer, then it will always get caught and it wont even think of the possibility to get away with a lie since that never happened.
Creating that training set though might cost many trillions of dollars though, since you need to basically recreate equivalent of internet but without any lies or bad intentions etc.
Truthfulness doesn't always align with honesty. The LLM should have said: "oops i saw the EXIF data, please pick another image".
And I don't even think it's a matter of the LLM being malicious. Humans playing games get their reward from fun, and will naturally reset the game if the conditions do not lead to it.