← Back to context

Comment by benswerd

9 hours ago

Kind of. The chat logs of the agent are trustworthly, as should any telemetry you have on it or coming out of the VM. Its behavior should be treated as probabilistic and therefore untrustworthly.

It’s untrustworthy because its context can be poisoned and then the agent is capable of harm to the extent of whatever the “computer” you give it is capable of.

The mitigation is to keep what it can do to “just the things I want it to do” (e.g. branch protection and the like, whitelisted domains/paths). And to keep all the credentials off its box and inject them inline as needed via a proxy/gateway.

I mean, that’s already something you can do for humans also.