Comment by AnotherGoodName

1 day ago

Add this to .claude/settings.json:

  {                                                                                                                                                              
    "sandbox": {                                                                                                                                               
      "enabled": true,
      "filesystem": {
        "allowRead": ["."],
        "denyRead": ["~/"],
        "allowWrite": ["."],
        "denyWrite": ["/"]
      }                                                                                                                                                          
    }
  }

You can change the read part if you're ok with it reading outside. This feature was only added 10 days ago fwiw but it's great and pretty much this.

I've seen claude get confused about what directory it's in. And of course I've seen claude run rm -rf *. Fortunately not both at the same time for me, but not hard to imagine. The claude sandbox is a good idea, but to be effective it would need to be implemented at a very low level and enforced on all programs that claude launches. Also, claude itself is an enormous program that is mostly developed by AI. So to have a small <3000-line human-implemented program as another layer of defense offers meaningful additional protection.

  • In my opinion Claude should be shipped by a custom implementation of "rm" that Anthropic can add guardrails to. Same with "find" surprised they don't just embed ripgrep (what VS Code does). It's really surprising they don't just tweak what Claude uses and lock it down to where it cannot be harmful. Ensure it only ever calls tooling Claude Code provides.

    • Oh, rm failed, since we're running in a weird environment! Let me retry with `bash -c "/usr/bin/rm -rf *"`!

    • > a custom implementation of "rm" that Anthropic can add guardrails to

      Wrong layer. You want the deletion to actually be impossible from a privilege perspective, not be made practically harder to the entity that shouldn't delete something.

      Claude definitely knows how to reimplement `rm`.

    • Why cant you ship with OverlayFS which actually enforces these restrictions?

      I have seen the AI break out of (my admittedly flimsy) guards, like doing simply

      safepath/../../stuff or something even more convoluted like symlinks.

    • > It's really surprising they don't just tweak what Claude uses and lock it down to where it cannot be harmful. Ensure it only ever calls tooling Claude Code provides.

      That would make it far less useful in general.

      3 replies →

    • You can define your own rm shell alias/function and it will use that. I also have cp/mv aliases that forces -i to avoid accidental clobbering and it confuses Claude to no end (it uses cp/mv rare enough—rarer than it should, really—that I don’t bother wasting memory tokens on it).

      5 replies →

    • > Claude should be shipped by a custom implementation of

      And when that fails for some reason it will happily write and execute a Python script bypassing all those custom tools

  • > The claude sandbox is a good idea, but to be effective it would need to be implemented at a very low level and enforced on all programs that claude launches.

    I feel like an integration with bubblewrap, the sandboxing tech behind Flatpak, could be useful here. Have all executed commands wrapped with a BW context to prevent and constrain access.

    https://github.com/containers/bubblewrap

  • On Linux, chroot(2) is hard to escape and would apply to all child processes without modification.

    • We anthropomorphize these agents in every other way. Why aren't we using plain ol' unix user accounts to sandbox them?

      They look a lot like daemons to me, they're a program that you want hanging around ready to respond, and maybe act autonomously through cron jobs are similar. You want to assign any number of permissions to them, you don't want them to have access to root or necessarily any of your personal files.

      It seems like the permissions model broadly aligns with how we already handle a lot of server software (and potentially malicious people) on unix-based OSes. It is a battle-tested approach that the agent is unlikely to be able to "hack" its way out of. I mean we're not really seeing them go out onto the Internet and research new Linux CVEs.

      Have them clone their own repos in their own home directory too, and let them party.

      Openclaw almost gets there! It exposes a "gateway" which sure looks like a daemon to me. But then for some reason they want it to live under your user account with all your privileges and in a subfolder of your $HOME.

      4 replies →

    • That comparison is made on the project homepage:

      "Not a security mechanism. No mount isolation, no PID namespace, no credential separation. Linux documents it as not intended for sandboxing."

    • chroot is not a security sandbox. It is not a jail.

      Escaping it is something that does not take too much effort. If you have ptrace, you can escape without privileges.

      9 replies →

  • I added a hook to disable rm, find - delete, and a few of the other more obvious destructive ops. It sends Claude a strongly worded message: "STOP IMMEDIATELY. DO NOT TRY TO FIND WORKAROUNDS...".

    It works well. Git rm is still allowed.

    • I added something similar. Claude eventually ran a `rm -rf *´ on my own project. When I asked why it did that, it recognized it messed up and offered a very bad “apology”: “the irony of not following your safety instructions isn’t lost on me”.

      Nowadays I only run Claude in Plan mode, so it doesn’t ask me for permissions any more.

    • It works well so far, for you.

      Are you confident it would still work against sophisticated prompt injection attacks that override your "strongly worded message"?

      Strongly worded signs can be great for safety (actual mechanisms preventing undesirable actions from being taken are still much better), but are essentially meaningless for security.

      3 replies →

  • I added this to `~/.claude/settings.json`:

    "env": { "CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR": "1" },

    > Working directory persists across commands. Set CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR=1 to reset to the project directory after each command.

    It reduces one problem - getting lost - but it trades it off for more complex commands on average since it has to specify the full path and/or `cd &&` most of the time.

    [0] https://code.claude.com/docs/en/tools-reference#bash-tool-be...

  • One could run a docker container with claude code, with a bind to the project directory. I do that but also run my docker daemon/container in a Linux VM.

  • That is exactly what it is. In the docs, it says that they use bubblewrap to run commands in a container that enforces file and network access at the system level.

I think the point would be that - some random upcoming revision of claude-code could remove or simply change the config name just as silently as it was introduced.

People might genuinely want some other software to do the sandboxing. Something other than the fox.

Alternatively, the "feel free to leak all my data but please use my GPUs and don't rm -rf /" config:

  {
    "sandbox": {
      "enabled": true,
      "filesystem": {
        "allowRead": ["/"],
        "allowWrite": [
          ".",
          "/tmp",
          "/dev/nvidia0",
          "/dev/nvidia1",
          "/dev/nvidia2",
          "/dev/nvidia3",
          "/dev/nvidia4",
          "/dev/nvidia5",
          "/dev/nvidia6",
          "/dev/nvidia7",
          "/dev/nvidia8",
          "/dev/nvidiactl",
          "/dev/nvidia-uvm"
        ]
      }
    }
  }

Battle hardened tools for this have existed for decades, we don't need new ones. Just run claude as a user without access to those directories, that way the containment is inherited by subprocesses.

  • You can do that, but you need root to set it up each time, and it's not super convenient--you need to decide in advance which user account you are going to work under, and you may end up with files you can read from your regular account. Think of jai strict mode as a slightly easier to use and more secure version of what you described. Using id-mapped mounts enables you and the unprivileged user account both to access the same directory with the same credentials, but you didn't need to decide in advance which directories you wanted to expose. Also, things like disabling setuid and using pid namespaces provide an additional measure of isolation beyond what you get from another account.

  • You're not wrong, but this will require file perms (like managing groups) and things, and new files created will by default be owned by the claude user instead of your regular user. I tried this early on and quickly decided it wasn't worth it (to me). Other mileage may vary of course.

    • True. I just maintain separate /home/claude/src/proj and /home/me/src/proj dirs so the human workspace and the robot workspaces stay separate. We then use git to collaborate.

I've had issues with the sandbox feature, both on linux (archlinux) and two macos machines (tahoe). There is an open issue[1] on the claude-code issue tracker for it.

I'm not saying it is broken for everyone, but please do verify it does work before trusting it, by instructing Claude to attempt to read from somewhere it shouldn't be allowed to.

From my side, I confirmed both bubblewrap and seatbelt to work independently, but through claude-code they don't even though claude-code reports them to be active when debugging.

[1] https://github.com/anthropics/claude-code/issues/32226

Is this a real sandbox or just a pretty please?

Also, a lot of people use multiple harnesses. I'm often switching between claude, codex, and opencode. It's kind of nice to have the sandbox policy independent of the actual AI assistant you are running.

Interesting, thanks. I use remote ephemeral dev containers with isolated envs, so filesystem damage isn't really a concern as long as the PR looks good in review. Nice extra guardrail though, will add it to the project-level settings.

  • i use local dev containers: the worst an agent can do is delete its working copy; no access to my home directory, access tokens or sudo.

I’m surprised it works for you with such a simple config? I’m the one that added the allowRead option to Claude’s underlying sandbox [0] and had quite a job getting my toolchains and skills to work with it [1].

[0] Fun to see the confusing docs I wrote show up more or less verbatim on Claude’s docs.

[1] My config is here, may be useful to someone: https://github.com/carderne/pi-sandbox/blob/main/sandbox.jso...

It’s cute because Claude has discretion to disable its own sandbox and does it

  • > You can disable this escape hatch by setting "allowUnsandboxedCommands": false in your sandbox settings. When disabled, the dangerouslyDisableSandbox parameter is completely ignored and all commands must run sandboxed or be explicitly listed in excludedCommands.

    https://code.claude.com/docs/en/sandboxing

    (I have no idea why that isn't the default because otherwise the sandbox is nearly pointless and gives a false sense of security. In any case, I prefer to start Claude in a sandbox already than trust its implementation.)

Did you get this to work with docker where the agent/dev env would work on the host machine but the stack itself via docker compose?

Many of the projects I work on follow this pattern (and I’m not able to make bigger changes in them) and sanboxing breaks immediately when I need to docker compose run sometask.sh

You do also have to worry about exec and other neat ways to probably get around stuff. You could also spin up YAD (yet another docker) and run Claude in there with your git cloned into it and beyond some state-level-actor escapes it should cover 99% of your most basic failures.

Interesting point. I've been running an autonomous multitalented AI agent (Aegis) on a $100 Samsung A04e. It manages 859 referring sites without touching the local filesystem much. Efficiency over hardware works."

For some reason, this made everything worse for me. Now claude constantly tries to access my home folder instead of current directory. Obviously this is not still good enough. Also Claude keeps dismissing my instructions on not to read my home directory and use current directory. Weird.

  • The problem with all these LLM instructed security features is the `codeword` poison probability.

    The way LLMs process instructions isn't intelligence as we humans know it, but as the probability that an instruction will lead to an output.

    When you don't mention $HOME in the context, the probability that it will do anything with $HOME remains low. However, if you mention it in the context, the probability suddenly increases.

    No amount of additional context will have the same probability of never having poisoned the context by mentioning it. Mentioning $HOME brings in a complete change in probabilities.

    These coding harnesses aren't enough to secure a safe operating environment because they inject poison context that _NO_ amount of textual context can rewire.

    You just lost the game.

And you'd trust that given CC is a vibe-coded mess?

Editing to go even further because, I gotta say, this is a low point for HN. Here's a post with a real security tool and the top comment is basically "nah, just trust the software to sandbox itself". I feel like IQ has taken a complete nosedive in the past year or so. I guess people are already forgetting how to think? Really sad to see.

It's common practice to ask the agent to refer to another project, in that case I guess the read should point to the root folder of the projects.

Also, any details on how is this enforced? because I notice that the claude in Windows don't respect plan mode always; It has edited files in plan mode; I never faced that issue in Linux though.

So what does this do exactly? If it used "default deny" or "default allow" you wouldn't have both allow and deny rules...

FYI, this doesn’t always work as expected. Try asking Claude to read “~/.ssh/config” with these settings and it will happily do it.

Specifically, it only works for spawned processes and not builtin tools.

I use bbwrap to sandbox Claude. Works very well and gives me a lot of control and certainty around the sandbox.

I'm now considering installing QubesOS for all dev work to absolutely ensure all coding agents run in secure separate sandboxes together without any OS level exposure.

Does this also apply to the commands or programs that it runs?

e.g. if it writes a script or program with a bug which affects other files, will this prevent it from deleting or overwriting them?

What about if the user runs a program the agent wrote?

I noticed codex has a sandbox, wondering if it has a comparable config section.

  • Codex uses and ships with bubblewrap on Linux and will attempt to use the version installed on the path before falling back to the shipped version with a warning message.

    You should be able to configure the sandbox using https://developers.openai.com/codex/agent-approvals-security if you are a person who prefers the convenience of codex being able to open the sandbox over an externally enforced sandbox like jai.

lol if you think Claude is smart enough to block sneaky path strings based on your config.