Comment by jFriedensreich

1 month ago

I dont know how to feel about being the only one refusing to run yolo mode until the tooling is there, which is still about 6 months away for my setup. Am I years behind everyone else by then? You can get pretty far without completely giving in. Agents really dont need to execute that many arbitrary commands. linting, search, edit, web access should all be bespoke tools integrated into the permission and sandbox system. agents should not even be allowed to start and stop applications that support dev mode, they edit files, can test and get the logs what else would they need to do? especially as the amount of external dependencies that make sense goes to a handful you can without headache approve every new one. If your runtime supports sandboxing and permissions like deno or workerd this adds an initial layer of defense.

This makes it even more baffling why anthropic went with bun, a runtime without any sandboxing or security architecture and will rely in apple seatbelt alone?

You use YOLO mode inside some sandbox (VM, container). Give the container only access to the necessary resources.

  • But even then, the agent can still exfiltrate anything from the sandbox, using curl. Sandboxing is not enough when you deal with agents that can run arbitrary commands.

    • What is your threat model?

      If you're worried about a hostile agent, then indeed sandboxing is not enough. In the worst case, an actively malicious agent could even try to escape the sandbox with whatever limited subset of commands it's given.

      If you're worried about prompt injection, then restricting access to unfiltered content is enough. That would definitely involve not processing third-party input and removing internet search tools, but the restriction probably doesn't have to be mechanically complete if the agent has also been instructed to use local resources only. Even package installation (uv, npm, etc) would be fine up to the existing risk of supply-chain attacks.

      If you're worried about stochastic incompetence (e.g. the agent nukes the production database to fix a misspelled table name), then a sandbox to limit the 'blast radius' of any damage is plenty.

      1 reply →

    • It depends on what you're trying to prevent.

      If your fear is exfiltration of your browser sessions and your computer joining a botnet, or accidental deletion of your data, then a sandbox helps.

      If your fear is the llm exfiltrating code you gave it access to then a sandbox is not enough.

      I'm personally more worried about the former.

      5 replies →

    • The whole point of the sandbox is that you don’t put anything sensitive inside of it. Definitely not credentials or anything sensitive/confidential.

    • That depends on how you configure or implement your sandbox. If you let it have internet access as part of the sandbox, then yes, but that is your own choice.

      1 reply →

  • Right idea but the reason people don't do this in practice is friction. Setting up a throwaway VM for every agent session is annoying enough that everyone just runs YOLO on their host.

    I built shellbox (https://shellbox.dev) to make this trivial -- Firecracker microVMs managed entirely over SSH. Create a box, point your agent at it, let it run wild. You can duplicate a box before a risky operation (instant, copy-on-write) and delete it after.

    Billing stops when the SSH session disconnects.

    No SDK, no container config, just ssh. Any agent that can run shell commands works out of the box.

  • apart from nearly no one using vms as far as i can tell, even if they were, a vm does not magically solve all the issues, its just a part of the needed tools.