Comment by zozbot234

1 day ago

The access to the secret, the long-term persisting/reasoning and the posting should all be done by separate subagents, and all exchange of data among them should be monitored. But this is easy in principle, since the data is just a plain-text context.

1 comment

zozbot234

grasper_ 18 hours ago

Easy in principle is doing a lot of work here. Splitting things into subagents sounds good in theory, but if a malicious prompt flows through your plain-text context stream, nothing fundamental has changed. If the outward-facing agent gets injected and passes along a reasonable looking instruction to the agent holding secrets, you haven’t improved security at all.