Comment by orliesaurus

2 months ago

I'm not surprised to see these horror stories...

The `--dangerously-skip-permissions` flag does exactly what it says. It bypasses every guardrail and runs commands without asking you. Some guides I’ve seen stress that you should only ever run it in a sandboxed environment with no important data Claude Code dangerously-skip-permissions: Safe Usage Guide[1].

Treat each agent like a non human identity, give it just enough privilege to perform its task and monitor its behavior Best Practices for Mitigating the Security Risks of Agentic AI [2].

I go even further. I never let an AI agent delete anything on its own. If it wants to clean up a directory, I read the command and run it myself. It's tedious, BUT it prevents disasters.

ALSO there are emerging frameworks for safe deployment of AI agents that focus on visibility and risk mitigation.

It's early days... but it's better than YOLO-ing with a flag that literally has 'dangerously' in its name.

[1] https://www.ksred.com/claude-code-dangerously-skip-permissio...

[2] https://preyproject.com/blog/mitigating-agentic-ai-security-...

61 comments

orliesaurus

mjd 2 months ago

A few months ago I noticed that even without `--dangerously-skip-permissions`, when Claude thought it was restricting itself to directory D, it was still happy to operate on file `D/../../../../etc/passwd`.

That was the last time I ran Claude Code outside of a Docker container.

ehnto 2 months ago
It will happily run bash commands, which expands it's reach pretty widely. It's not limited to file operations, and can run system wide commands with your user permissions.
- wpm 2 months ago
  
  Seems like the best way to limit its ability to destroy things is to run it as a separate user without sudo capabilities if the job allows.
  That said running basic shell commands seems like the absolute dumbest way to spend tokens. How much time are you really saving?
- classified 2 months ago
  
  And `sudo`, if your user ID allows it!
SoftTalker 2 months ago
You don't even need a container. Make claude a local user. Without sudo permission. It will be confined to damaging its own home directory only.
- mjd 2 months ago
  
  And reading any world-readable file.
  No thanks, containers it is.
  
  16 replies →
- stevefan1999 2 months ago
  
  The problem is, container (or immutable) based development environment, like DevContainers and Nix Flakes, still aren't the popular choice for most developments.
  I self-hosted DevPods and Coder, but it is quite tedious to do so. I'm experimenting with Eclipse Che now, I'm quite satisfied with it, except it is hard to setup (you need a K8S cluster attached to a OIDC endpoint for authentication and authorization, and a git forge for credentials), and the fact that I cannot run real web-version of VSCode (it looks like VSCode but IIRC it is a Monaco fork that looks almost like VSCode one-to-one but not exactly it) and most extensions on it (and thus limited to OpenVSIX) is a dealbreaker. But in exchange I have a pure K8S based development lifecycle, all my dev environment lives on K8S (including temporary port forwarding -- I have wildcard DNS setup for that), so all my work lives on K8S.
  Maybe I could combine a few more open source projects together to make a product.
  
  4 replies →
Dylan16807 2 months ago
By operate on you mean that actually got through and it opened the file?
- mjd 2 months ago
  
  Yes, although the example I had it operate on was different.

postalcoder 2 months ago

While I agree that `--dangerously-skip-permissions` is (obviously) dangerous, it shouldn't be considered completely inaccessible to users. A few safeguards can sand off most of the rough edges.

What I've done is write a PreToolUse hook to block all `rm -rf` commands. I've also seen others use shell functions to intercept `rm` commands and have it either return a warning or remap it to `trash`, which allows you to recover the files.

112233 2 months ago
Does your hook also block "rm -rf" implemented in python, C or any other language available to the LLM?
One obviously safe way to do this is in a VM/container.
Even then it can do network mischief
- doubled112 2 months ago
  
  I’ve heard of people running “rm -Rf” incorrectly and deleting their backups too since the NAS was mounted.
  I could certainly see it happening in a VM or container with an overlooked mount.

Retr0id 2 months ago

> Treat each agent like a non human identity

Why special-case it as a non-human? I wouldn't even give a trusted friend a shell on my local system.

stevefan1999 2 months ago

That's exactly why I let the LLM run read-only commands automatically, but anything that could potentially trigger mutation (either removal or insertion) requires manual intervention.

Another way to prevent this is to run a filesystem snapshot each mutation command approval (that's where COW based filesystems like ZFS and BTRFS would shine), except you also have to block the LLM from deleting your filesystem and snapshots, or dd'ing stuff to your block devices to corrupt it, and I bet it will eventually evolve into this egregiously.

forrestthewoods 2 months ago

AI tools are honestly unusable without running in yolo mode. You have to baby every single little command. It is utterly miserable and awful.

coldtea 2 months ago
And that is how easily we lose agency to AI. Suddenly even checking the commands that a technology (unavailable until 2-3 years ago) writes for us, is perceived as some huge burden...
- frostiness 2 months ago
  
  The problem is that it genuinely is. One of the appeals of AI is that you can focus on planning instead of actually doing running the commands yourself. If you're educated enough to be able to validate what the commands are doing (which you should be if you're trusting an AI in the first place), then if you have to individually approve pretty much everything the AI does you're not much faster than just doing it yourself. In my experience, not running in YOLO mode negates most advantages of agents in the first place.
  AI is either an untrustworthy tool that sometimes wipes your computer for a chance at doing something faster than you would've been able to on your own, or it's no faster than just doing it yourself.
  
  2 replies →
skeledrew 2 months ago

Better to continuously baby than to have intense regrets.
theshrike79 2 months ago
Only Codex. I haven't found a sane way to let it access, for example, the Go cache in my home directory (read only) without giving it access EVERYWHERE. Now it does some really weird tricks to have a duplicate cache in the project directory. And then it forgets to do it and fails and remembers again.
With Claude the basic command filters are pretty good and with hooks I can go to even more granular levels if needed. Claude can run fd/rg/git all it wants, but git commit/push always need a confirmation.
- joseda-hg 2 months ago
  
  Would Linking the folder so it thinks it's inside it's project directory work?
  That way it doesn't need to go outside of it
ehnto 2 months ago
I have to correct a few commands basically every interaction with AI, so I think YOLO mode would get me subpar outcomes.
- forrestthewoods 2 months ago
  
  If it gets the command wrong it’s exceedingly unlikely to be a catastrophic failure. So it’d probably just figure it out on its own.
  
  2 replies →
rsynnott 2 months ago

I mean, given the linked reddit post, they are clearly unusable when running in yolo mode, too.

JumpCrisscross 2 months ago

> I'm not surprised to see these horror stories

I am! To the point that I don’t believe it!

You’re running an agentic AI and can parse through logs, but you can’t sandbox or back up?

Like, I’ve given Copilot permission to fuck with my admin panel. It promptly proceeded to bill thousands of dollars, drawing heat maps of the density of built structures in Milwaukee; buying subscriptions to SAP Joule and ArcGIS for Teams; and generating terabytes of nonsense maps, ballistic paths and “architectural sketch[es] of a massive bird cage the size of Milpitas, California (approximately 13 square miles)” resembling “a futuristic aviary city with large domes, interconnected sky bridges, perches, and naturalistic environments like forests, lakes, and cliffs inside.”

But support immediately refunded everything. I had backups. And it wound up hilarious albeit irritating.

AdieuToLogic 2 months ago
>> I'm not surprised to see these horror stories
> I am! To the point that I don’t believe it!
> You’re running an agentic AI and can parse through logs, but you can’t sandbox or back up?
When best practices for using a tool involves sandboxing and/or backing up before each use in order to minimize the blast radius of using same, it begs the question; why use it knowing there is a nontrivial probability one will have to recover from it's use any number of times?
> Like, I’ve given Copilot permission to fuck with my admin panel. It promptly proceeded to bill thousands of dollars ... But support immediately refunded everything. I had backups.
And what about situations where Claude/Copilot/etc. use were not so easily proven to be at fault and/or their impacts were not reversible by restoring from backups?
- JumpCrisscross 2 months ago
  
  > why use it knowing there is a nontrivial probability one will have to recover from it's use any number of times?
  Because the benefits are worth the risk. (Even if the benefit is solely sating curiosity.)
  I’m not defending this case. I’m just saying that every one of us has rm -r’d or rm*’d something, and we did it because we knew it saved time most of the time and was recoverable otherwise.
  Where I’m sceptical is that someone who can use the tool is also being ruined by a drive wipe. It reads like well-targeted outrage pork.
  
  1 reply →
rurp 2 months ago

Wait, so you've literally experienced these tools going conpletely off the rails but you can't imagine anyone using them recklessly? Not to be overly snarky but have you worked with people before? I fully expect that most people will be careful to not run into this sort of mess, but I'm equally sure that some subset users will be absolutely asking for it.
fwipsy 2 months ago
Can you post the birdcage thing? That sounds fascinating.
- JumpCrisscross 2 months ago
  
  Literally terabytes of Word and PowerPoint documents displaying and debating various ways to build big bird cages. In Milpitas.
  I noticed the nonsense due to an alert that my OneDrive was over limit, which caught my attention, since I don’t use OneDrive.
  If I prompted a half-decent LLM to run up billables, I doubt I could have done a better job.
  
  2 replies →
QuercusMax 2 months ago
....how is this a serious product that anyone could consider using?
- JumpCrisscross 2 months ago
  
  > how is this a serious product that anyone could consider using?
  I like Kagi’s Research agent.
  Personally, I was curious about a technology and ready for amusement. I also had local backups. So my give a shit factor was reduced.
  
  2 replies →