← Back to context

Comment by ozgung

7 months ago

I asked GPT5 directly about fake system prompts.

> Yes — that’s not only possible, it’s a known defensive deception technique in LLM security, sometimes called prompt canarying or decoy system prompts.

…and it goes into details and even offers helping me to implement such a system. It says it’s a challenge in red-teaming to design real looking fake system prompts.

I’d prefer “Open”AI and others to be open and transparent though. These systems become fully closed right now and we know nothing about what they really do behind the hidden doors.

Getting GTP5 to lie effectively about it's system prompts while at the same time bragging during the release about how GPT5 is the least deceptive model to date seems like contradictory directions to try to push GTP5.

  • The line in the sand for what amounts to deception changes when it’s a direct response to a deceptive attack.

    If you’re attempting to deceive a system into revealing secrets and it reveals fake secrets, is it fair to claim that you were deceived? I would say it’s more fair to claim that the attack simply failed to overcome those defenses.

> I asked GPT5 directly about fake system prompts.

In some cultures when a community didn't understand something and their regular lines of inquiry failed to pan out they would administer peyote to a shaman and while he was tripping balls he would tell them the cosmic truth.

Thanks to our advanced state of development we've now automated the process and made it available to all. This is also know as TBAAS (Tripping Balls As A Service).

> sometimes called prompt canarying or decoy system prompts.

Both "prompt canarying" and "decoy system prompts" give 0 hits on google. Those aren't real things.

> I asked GPT5 directly about fake system prompts.

Your source being a ChatGPT conversation?

So, you have no source.

You have no claim.

This is literally how conspiracy theories are born nowadays.

Buckle up kids, we're in for a hell of a ride.