← Back to context

Comment by benswerd

8 hours ago

I recommend running the agent harness outside of the computer. The mental model I like to use is the computer is a tool the agent is using, and anything in the computer is untrusted.

I would recommend not giving an agent the full run of any computing environment. Do handle fine grained internet access controls and credential injection like OpenShell does?

  • I used to believe this, but I think the next generation of agents is much more autonomous and just needs a computer.

    The work of a developer is open ended, so we use a computer for it. We don't try to box developers into small granular screwdrivers for each small thing.

    Thats whats coming to all agents, they might want to run some analysis with python, want to generate a website/document in typescript, and might want to store data in markdown files or in MongoDB. I expect them to get much more autonomous and with that to end up just needing computers like us.

    • The difference is that I am not always legally liable for what a rogue developer does with their computer - if I had no knowledge of what they were up to and had clear policies they violated then I'm probably fine. But I'm definitely always liable for anything an agent I created does with the computer I gave it.

      And while they are getting better I see them doing some spectacularly stupid shit sometimes that just about no person would ever do. If you tell an agent to do something and it can't do what it thinks you want in the most straightforward way, there is really no way to put a limit on what it might try to do to fulfill its understanding of its assignment.

The problem is the agent, which should be treated untrusted. The computer isn’t the problem

  • Kind of. The chat logs of the agent are trustworthly, as should any telemetry you have on it or coming out of the VM. Its behavior should be treated as probabilistic and therefore untrustworthly.

    • It’s untrustworthy because its context can be poisoned and then the agent is capable of harm to the extent of whatever the “computer” you give it is capable of.

      The mitigation is to keep what it can do to “just the things I want it to do” (e.g. branch protection and the like, whitelisted domains/paths). And to keep all the credentials off its box and inject them inline as needed via a proxy/gateway.

      I mean, that’s already something you can do for humans also.