Comment by CuriouslyC

1 day ago

The parent's model is right. You can mitigate a great deal with a basic zero trust architecture. Agents don't have direct secret access, and any agent that accesses untrusted data is itself treated as untrusted. You can define a communication protocol between agents that fails when the communicating agent has been prompt injected, as a canary.

More on this technique at https://sibylline.dev/articles/2026-02-15-agentic-security/

3 comments

CuriouslyC

what 13 hours ago

>You can define a communication protocol between agents that fails when the communicating agent has been prompt injected

Good luck with that.

aix1 12 hours ago
Yeah, how exactly would that work?
- CuriouslyC 5 hours ago
  
  A schema with response metadata (so responses that deviate from it fail automatically), plus a challenge question that's calibrated to be hard enough that the disruption of instruction following from prompt injection can cause the model to answer incorrectly.