Comment by alphazard

2 months ago

The thing that always trips me up is the lack of isolation/sandboxing that all of the AI programming tools provide. I want to orchestrate a workforce of agents, but they can't be trusted not to run amok.

Does anyone have a better way to do this other than spinning up a cloud VM to run goose or claude or whatever poorly isolated agent tool?

10 comments

alphazard

dnw 2 months ago

I have seen Claude disable its sandbox. Here is the most recent example from a couple of weeks ago while debugging Rust: "The panic is due to sandbox restrictions, not code errors. Let me try again with the sandbox disabled:"

I have since added a sandbox around my ~/dev/ folder using sandbox-exec in macOS. It is a pain to configure properly but at least I know where sandbox is controlled.

resfirestar 2 months ago

That refers to the sandbox "escape hatch" [1], running a command without a sandbox is a separate approval so you get another prompt even if that command has been pre-approved. Their system prompt [2] is too vague about what kinds of failures the sandbox can cause, in my experience the agent always jumps straight to disabling the sandbox if a command fails. Probably best to disable the escape hatch and deal with failures manually.
[1] https://code.claude.com/docs/en/sandboxing#configure-sandbox...
[2] https://github.com/Piebald-AI/claude-code-system-prompts/blo...

shepherdjerred 2 months ago

I'm working on a solution [0] for this. My current approach is:

1. Create a new Git worktree

2. Create a Docker container w/ bind mount

3. Provide an interface for easily switching between your active worktrees/containers.

For credentials, I have an HTTP/HTTPS mitm [1] that runs on the host with creds, so there are zero secrets in the container.

The end goal is to be able to manage, say, 5-10 Claude instances at a time. I want something like Claude Code for Web, but self-hosted.

[0]: https://github.com/shepherdjerred/monorepo/tree/main/package...

[1]: https://github.com/shepherdjerred/monorepo/pull/156

aoeusnth1 2 months ago

This is also what I did. Actually, Claude did it.

ciconia 2 months ago

If they cannot be trusted, why would you use them in the first place?

zephen 2 months ago
Obviously people perceive value there, but on the surface it does seem odd.
"These things are more destructive than your average toddler, so you need to have a fence in place kind of like that one in Jurassic Park, except you need to make sure it absolutely positively cannot be shut off, but all this effort is worthwhile, because, kind of like civets, some of the artifacts they shit out while they are running amok appear to have some value."
- chasd00 2 months ago
  
  It’s shocking the collective shrug I get from our security people at work. I attend pretty serious meetings about genAI implementations and when I ask about points of view around security given things as crazy as “adversarial poetry” is a real thing I just get shrugs. I get the feeling they don’t want to be the ones to say “no, don’t bring genai to our clients” but also won’t dare say “yes, our client’s data is safe with integrated genai”.
- ares623 2 months ago
  
  Love the mix of metaphors.
CamperBob2 2 months ago

For the same reason you'd build a fire.

ashishb 2 months ago

I run them inside a sandbox https://github.com/ashishb/amazing-sandbox