Comment by anywhichway
7 months ago
Getting GTP5 to lie effectively about it's system prompts while at the same time bragging during the release about how GPT5 is the least deceptive model to date seems like contradictory directions to try to push GTP5.
The line in the sand for what amounts to deception changes when it’s a direct response to a deceptive attack.
If you’re attempting to deceive a system into revealing secrets and it reveals fake secrets, is it fair to claim that you were deceived? I would say it’s more fair to claim that the attack simply failed to overcome those defenses.