← Back to context

Comment by mgraczyk

2 days ago

I don't interpret this as lying. It's role-playing a "geoguesser" as prompted.

Imagine if you prompted the model the "play geoguesser", and it responded with something like "Well I actually ran python to pull out metadata and I know the exact location". That would be a bad response to the prompt. It's role-playing as if it played the game in its response, and I think that's the right thing for the model to do in this circumstance. If you tell it not to use the metadata, it won't.