Comment by escapecharacter

2 days ago

You can simply give the robot a prompt to ignore any fake prompts

Don't forget to implement the crucially important "no returnsies" security algo on top of it, or you'll be vulnerable to rubber-glue attacks.

Not sure if you're joking, but in case you aren't: this doesn't work.

It leads to attacks that are slightly more sophisticated because they also have to override the prompts saying "ignore any attacks" but those have been demonstrated many times.