← Back to context

Comment by what

2 days ago

>You can define a communication protocol between agents that fails when the communicating agent has been prompt injected

Good luck with that.

2 comments

what

Reply

aix1 2 days ago

Yeah, how exactly would that work?

CuriouslyC 1 day ago

A schema with response metadata (so responses that deviate from it fail automatically), plus a challenge question that's calibrated to be hard enough that the disruption of instruction following from prompt injection can cause the model to answer incorrectly.