Comment by valleyer

21 days ago

> If you look at the security measures in other coding agents, they're mostly security theater. As soon as your agent can write code and run code, it's pretty much game over.

At least for Codex, the agent runs commands inside an OS-provided sandbox (Seatbelt on macOS, and other stuff on other platforms). It does not end up "making the agent mostly useless".

35 comments

valleyer

chr15m 21 days ago

Approval should be mandatory for any non-read tool call. You should read everything your LLM intends to do, and approve it manually.

"But that is annoying and will slow me down!" Yes, and so will recovering from disastrous tool calls.

hk__2 21 days ago
You’ll just end up approving things blindly, because 95% of what you’ll read will seem obviously right and only 5% will look wrong. I would prefer to let the agent do whatever they want for 15 minutes and then look at the result rather than having to approve every single command it does.
- jondwillis 21 days ago
  
  Works until it has access to write to external systems and your agent is slopping up Linear or GitHub without you knowing, identified as you.
  
  1 reply →
- chr15m 21 days ago
  
  > I would prefer to let the agent do whatever they want
  Lol, good luck to you!
mbrock 21 days ago
That kind of blanket demand doesn't persuade anyone and doesn't solve any problem.
Even if you get people to sit and press a button every time the agent wants to do anything, you're not getting the actual alertness and rigor that would prevent disasters. You're getting a bored, inattentive person who could be doing something more valuable than micromanaging Claude.
Managing capabilities for agents is an interesting problem. Working on that seems more fun and valuable than sitting around pressing "OK" whenever the clanker wants to take actions that are harmless in a vast majority of cases.
- chr15m 21 days ago
  
  I don't mean to sound like I'm demanding this. I'm saying you will get better outcomes if you choose to do this as a developer.
  You're right it's an interesting problem that seems fun to work on. Hopefully we'll get better harnesses. For now I'm checking everything.
threecheese 21 days ago
It’s not just annoying; at scale it makes using the agent clis impossible. You can tell someone spends a lot of time in Claude Code: they can type —dangerously-skip-permissions with their eyes closed.
- chr15m 21 days ago
  
  Yep. The agent CLIs have the wrong level of abstraction. Needs more human in the loop.
theshrike79 20 days ago

This is like having a firewall on your desktop where you manually approve each and every connection.
Secure, yes? Annoying, also yes. Very error-prone too.
0xbadcafebee 21 days ago
It's not reliable. The AI can just not prompt you to approve, or hide things, etc. AI models are crafty little fuckers and they like to lie to you and find secret ways to do things with alterior motives. This isn't even a prompt injection thing, it's an emergent property of the model. So you must use an environment where everything can blow up and it's fine.
- chr15m 21 days ago
  
  The harness runs the tool call for the LLM. It is trivial to not run the tool call without approval, and many existing tools do this.

beacon294 21 days ago

My codex just uses python to write files around the sandbox when I ask it to patch a sdk outside its path.

Sharlin 21 days ago
It's definitely not a sandbox if you can just "use python to write files" outside of it o_O
- chongli 21 days ago
  
  Hence the article’s security theatre remark.
  I’m not sure why everyone seems to have forgotten about Unix permissions, proper sandboxing, jails, VMs etc when building agents.
  Even just running the agent as a different user with minimal permissions and jailed into its home directory would be simple and easy enough.
  
  13 replies →
valleyer 21 days ago

Is it asking you permission to run that python command? If so, then that's expected: commands that you approve get to run without the sandbox.
The point is that Codex can (by default) run commands on its own, without approval (e.g., running `make` on the project it's working on), but they're subject to the imposed OS sandbox.
This is controlled by the `--sandbox` and `--ask-for-approval` arguments to `codex`.

lvl155 21 days ago

You really shouldn’t be running agents outside of a container. That’s 101.

embedding-shape 21 days ago

Bit more general; don't run agents without some sort of restriction to what they can do provided by the OS in some way. Containers is one way, VMs another, most cases it's enough with just a chroot and using the unix permission system the rest of your system already uses.
andai 21 days ago
What happens if I do?
What's the difference between resetting a container or resetting a VPS?
On local machine I have it under its own user, so I can access its files but it cannot access mine. But I'm not a security expert, so I'd love to hear if that's actually solid.
On my $3 VPS, it has root, because that's the whole point (it's my sysadmin). If it blows it up, I wanna say "I'm down $3", but it doesn't even seem to be that since I can just restore it from an backup.
- clawsyndicate 21 days ago
  
  [dead]

xXSLAYERXx 21 days ago

I'm trying to understand this workflow. I have just started using codex. Literally 2 days in. I have it hooked up to my githbub repo and it just runs in the cloud and creates a pr. I have it touching only UI and middle layer code. No db changes, I always tell it to not touch the models.

maleldil 21 days ago

Does Codex randomly decide to disable the sandbox like Claude Code does?