Comment by mazieres

20 hours ago

I've seen claude get confused about what directory it's in. And of course I've seen claude run rm -rf *. Fortunately not both at the same time for me, but not hard to imagine. The claude sandbox is a good idea, but to be effective it would need to be implemented at a very low level and enforced on all programs that claude launches. Also, claude itself is an enormous program that is mostly developed by AI. So to have a small <3000-line human-implemented program as another layer of defense offers meaningful additional protection.

62 comments

mazieres

giancarlostoro 19 hours ago

In my opinion Claude should be shipped by a custom implementation of "rm" that Anthropic can add guardrails to. Same with "find" surprised they don't just embed ripgrep (what VS Code does). It's really surprising they don't just tweak what Claude uses and lock it down to where it cannot be harmful. Ensure it only ever calls tooling Claude Code provides.

nananana9 12 hours ago

Oh, rm failed, since we're running in a weird environment! Let me retry with `bash -c "/usr/bin/rm -rf *"`!
throwaway2027 15 hours ago
All of which is useless when it just starts using big blocks of python instead. You need filesystem sandboxing for the python interpreter too.
- giancarlostoro 4 hours ago
  
  If you disallow it from just writing Python scripts to bypass its defined environment at its core system training why would this matter? I would lockdown its path anything that tries to call Python should require the end-user to approve and see the raw script before they do.
  
  1 reply →
- ethanwillis 15 hours ago
  
  What we need is a capabilities based security system. It could write all the python, asm, whatever it wants and it wouldn't matter at all if it was never given a reference to use something it shouldn't.
  
  8 replies →
lxgr 13 hours ago

> a custom implementation of "rm" that Anthropic can add guardrails to
Wrong layer. You want the deletion to actually be impossible from a privilege perspective, not be made practically harder to the entity that shouldn't delete something.
Claude definitely knows how to reimplement `rm`.
torginus 11 hours ago

Why cant you ship with OverlayFS which actually enforces these restrictions?
I have seen the AI break out of (my admittedly flimsy) guards, like doing simply
safepath/../../stuff or something even more convoluted like symlinks.
eru 16 hours ago
> It's really surprising they don't just tweak what Claude uses and lock it down to where it cannot be harmful. Ensure it only ever calls tooling Claude Code provides.
That would make it far less useful in general.
- KronisLV 15 hours ago
  
  Maybe Anthropic (or some collection of the large AI orgs, like OpenAI and Anthropic and Google coming together) should apply patches on top of (or fork altogether) the coreutils and whatever you normally get in a userland - a bit like what you get in Git Bash on Windows, just with:
  1) more guardrails in place
  2) maybe more useful error messages that would help LLMs
  3) no friction with needing to get any patches upstreamed
  External tool calling should still be an option ofc, but having utilities that are usable just like what's in the training data, but with more security guarantees and more useful output that makes what's going on immediately obvious would be great.
  
  2 replies →
walthamstow 15 hours ago
Claude has told me that its Grep tool does use rg under the hood, but I constantly find it using the Bash tool with grep
- giancarlostoro 4 hours ago
  
  When I tell it to use rg it goes much faster than it using grep. I really don't understand why its slower with grep.
oefrha 17 hours ago
You can define your own rm shell alias/function and it will use that. I also have cp/mv aliases that forces -i to avoid accidental clobbering and it confuses Claude to no end (it uses cp/mv rare enough—rarer than it should, really—that I don’t bother wasting memory tokens on it).
- d1sxeyes 17 hours ago
  
  I did this, Claude detected it and decided to run /bin/rm directly.
  
  4 replies →
troupo 15 hours ago

> Claude should be shipped by a custom implementation of
And when that fails for some reason it will happily write and execute a Python script bypassing all those custom tools

mroche 14 hours ago

> The claude sandbox is a good idea, but to be effective it would need to be implemented at a very low level and enforced on all programs that claude launches.

I feel like an integration with bubblewrap, the sandboxing tech behind Flatpak, could be useful here. Have all executed commands wrapped with a BW context to prevent and constrain access.

https://github.com/containers/bubblewrap

r4indeer 14 hours ago
Bubblewrap is exactly what the Claude sandbox uses.
> These restrictions are enforced at the OS level (Seatbelt on macOS, bubblewrap on Linux), so they apply to all subprocess commands, including tools like kubectl, terraform, and npm, not just Claude’s file tools.
https://code.claude.com/docs/en/sandboxing
- Melonai 10 hours ago
  
  Oh wow I'd have expected them to vibe-code it themselves. Props to them, bubblewrap is really solid, despite all my issues with the things built on top of it, what, Flatpak with its infinite xdg portals, all for some reason built on D-Bus, which extremely unluckily became the primary (and only really viable) IPC protocol on Linux, bwrap still makes a great foundation, never had a problem with it in particular. I tend to use it a bunch with NixOS and I often see Steam invoking it to support all of its runtimes. It's containers but actually good.
- mroche 14 hours ago
  
  The more you know, thanks for the information!

PaulDavisThe1st 20 hours ago

On Linux, chroot(2) is hard to escape and would apply to all child processes without modification.

safety1st 16 hours ago
We anthropomorphize these agents in every other way. Why aren't we using plain ol' unix user accounts to sandbox them?
They look a lot like daemons to me, they're a program that you want hanging around ready to respond, and maybe act autonomously through cron jobs are similar. You want to assign any number of permissions to them, you don't want them to have access to root or necessarily any of your personal files.
It seems like the permissions model broadly aligns with how we already handle a lot of server software (and potentially malicious people) on unix-based OSes. It is a battle-tested approach that the agent is unlikely to be able to "hack" its way out of. I mean we're not really seeing them go out onto the Internet and research new Linux CVEs.
Have them clone their own repos in their own home directory too, and let them party.
Openclaw almost gets there! It exposes a "gateway" which sure looks like a daemon to me. But then for some reason they want it to live under your user account with all your privileges and in a subfolder of your $HOME.
- lxgr 12 hours ago
  
  > for some reason they want it to live under your user account
  The entire idea of Openclaw (i.e., the core point of what distinguishes it from agents like Claude Code) is to give it access to your personal data, so it can act as your assistant.
  If you only need a coding agent, Openclaw is the completely wrong tool. (As a side note, after using it for a few weeks, I'm not convinced it's the right tool for anything, but that's a different story.)
- gwking 9 hours ago
  
  I tried this with Claude code on macOS. I created a new agent user and a wrapper do run Claude has that user, along with some scripts to set permissions and ownership so that I could run simple allow/deny commands. The only problem was that the fancy oauth flow broke. I filed an issue with Anthropic and their ticket bot auto closed it “for lack of interest” or whatever.
  I fiddled with transferring the saved token from my keychain to the agent user keychain but it was not straightforward.
  If someone knows how to get a subscription to Claude to work on another user via command line I’d love to know about it.
- jon-wood 15 hours ago
  
  Oh that’s an idea. I was going to argue that it’s a problem that you might want multiple instances in different contexts but sandboxing processes (possibly instanced) is exactly what systemd units are designed to deal with.
- search_facility 16 hours ago
  
  Exactly!
wasted_intel 9 hours ago

That comparison is made on the project homepage:
"Not a security mechanism. No mount isolation, no PID namespace, no credential separation. Linux documents it as not intended for sandboxing."
shakna 20 hours ago
chroot is not a security sandbox. It is not a jail.
Escaping it is something that does not take too much effort. If you have ptrace, you can escape without privileges.
- brianush1 19 hours ago
  
  claude is stupid but not malicious; chroot is sufficient
  
  8 replies →

esperent 19 hours ago

I added a hook to disable rm, find - delete, and a few of the other more obvious destructive ops. It sends Claude a strongly worded message: "STOP IMMEDIATELY. DO NOT TRY TO FIND WORKAROUNDS...".

It works well. Git rm is still allowed.

Diti 17 hours ago

I added something similar. Claude eventually ran a `rm -rf *´ on my own project. When I asked why it did that, it recognized it messed up and offered a very bad “apology”: “the irony of not following your safety instructions isn’t lost on me”.
Nowadays I only run Claude in Plan mode, so it doesn’t ask me for permissions any more.
lxgr 13 hours ago
It works well so far, for you.
Are you confident it would still work against sophisticated prompt injection attacks that override your "strongly worded message"?
Strongly worded signs can be great for safety (actual mechanisms preventing undesirable actions from being taken are still much better), but are essentially meaningless for security.
- unshavedyak 8 hours ago
  
  Not sure about OPs impl, but the wording doesn’t matter. The hook prevents the use of whatever action you want. Eg it’s impossible for Claude to use Emojis for me. My hook doesn’t allow it.
  So it’s deterministic based upon however the script it written
- esperent 11 hours ago
  
  I mean, that's like saying are you sure that your antivirus would prevent every possible virus? Are you sure that you haven't made some mistake in your dev box setup that would allow a hacker to compromise it? What if a thief broke i to your house and stole your laptop? That's happened to me before, much more annoying to recover from that an accidental rm rf.
  I do my best to keep off site back ups and don't worry about what I can't control.
  
  1 reply →

thehours 16 hours ago

I added this to `~/.claude/settings.json`:

"env": { "CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR": "1" },

> Working directory persists across commands. Set CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR=1 to reset to the project directory after each command.

It reduces one problem - getting lost - but it trades it off for more complex commands on average since it has to specify the full path and/or `cd &&` most of the time.

[0] https://code.claude.com/docs/en/tools-reference#bash-tool-be...

digikata 13 hours ago

One could run a docker container with claude code, with a bind to the project directory. I do that but also run my docker daemon/container in a Linux VM.

martenlienen 16 hours ago

That is exactly what it is. In the docs, it says that they use bubblewrap to run commands in a container that enforces file and network access at the system level.

calvinmorrison 5 hours ago

Pledge might be useful here

marsven_422 18 hours ago

[dead]

3yr-i-frew-up 14 hours ago

[dead]