Comment by anywhichway
7 months ago
> sometimes called prompt canarying or decoy system prompts.
Both "prompt canarying" and "decoy system prompts" give 0 hits on google. Those aren't real things.
7 months ago
> sometimes called prompt canarying or decoy system prompts.
Both "prompt canarying" and "decoy system prompts" give 0 hits on google. Those aren't real things.
I did a search and found reltive terms: https://www.reddit.com/r/hacking/comments/1kqi0tm/how_canari...
https://medium.com/@tomer2138/how-canaries-stop-prompt-injec...
Those talk about a mechanism to detect prompt injection. If that had been true, we should have seen the chatbot refuse, not lie.
Maybe it was trained on some internal documentation. ;)