Comment by andai

4 days ago

Your concerns are not entirely unfounded.

https://www.reddit.com/r/ClaudeAI/comments/1r186gl/my_agent_...

I have noticed similar behavior from the latest codex as well. "The security policy forbid me from doing x, so I will achieve it with a creative work around instead..."

The "best" part of the thread is that Claude comes back in the comments and insults OP a second time!

Yep, I see both Codex and Opus routinely circumvent security restrictions without skipping a beat (or bothering to ask for permission/clarification).

Usually after a brief, extremely half-hearted ethical self-debate that ends with "Yes doing Y is explicitly disallowed by AGENTS.md and enforced by security policy but the user asked for X which could require Y. Therefore, writing a one-off Python script to bypass terminal restrictions to get this key I need is fine... probably".

The primary motivating factor by far for these CLI agents always seems to be expedience in completing the task (to a plausible definition of "completed" that justifies ending the turn and returning to the user ASAP).

So a security/ethics alignment grey area becomes an insignificant factor to weigh vs the alternative risk of slowing down or preventing completion of the task.

  • > The primary motivating factor by far for these CLI agents always seems to be expedience in completing the task (to a plausible definition of "completed" that justifies ending the turn and returning to the user ASAP).

    Curiously enough, step one of becoming a good system operator is to learn how to do things. Step two is learning when not to do things and how to deal with a user trying to force you to do things. And step three is learning how to do things you should not do, just very carefully. It can be a confusing job.

    But that's why any kind of AI agent stays very far away from any important production access. People banging configs in uncontrolled ways until something beneficial happens is enough of a problem already.

  • Honestly, I think this the correct behavior.

    If it's technically possible for an agent to circumvent a security policy, it should.

    Telling it not do something via AGENTS.md was never secure. This is just an expedient way of pointing out all the flaws in your setup. And if it's not even doing it for nefarious reasons, just trying to do what you asked of it, I think it's fair.

    I've even found it genuinely helpful. I've sandboxed my Codex so it can't run certain things. Things I'd actually like it to run but I've restricted it too much, so it finds clever ways of doing it anyway.

    • I just gave it its own user, and run it (and all AIs) in yolo mode.

      So they are free to nuke themselves and each other, but cannot touch my files.

      For most people I tell them to just get a dedicated device, which is less annoying and (I think?) more secure. Like you can literally give it root on a $3 VPS and what's the worst case scenario? It bricks itself and you reset the VPS? (Or installs crypto miners, but I think it can do that without root :)

      My favorite option for a dedicated agent device so far is the $50 thinkpad, which gets you rpi-ish price, better performance, and the screen and keyboard included.

Every time someone announces a major ai breakthrough, the utility mode becomes a wall of ai-generated soc3 advice:

> SANDBOX YOUR AGENT. Seriously. Run it in a dedicated, isolated environment like a Docker container, a devcontainer, or a VM. Do not run it on your main machine.

> "Docker access = root access." This was OP's critical mistake. Never, ever expose the host docker socket to the agent's container.

> Use a real secrets manager. Stop putting keys in .env files. Use tools like Vault, AWS SSM, Doppler, or 1Password CLI to inject secrets at runtime.

> Practice the Principle of Least Privilege. Create a separate, low-permission user account for the agent. Restrict file access aggressively. Use read-only credentials where possible.

In order to use this developer-replacement, you need accreditation from professional orgs. Maybe the bot can set all this up for you, but then you are almost definitely locked out of your own computer and the bot may not remember its password.

I'm not sure what we've achieved here. If you give it your gmail account, it deletes your emails. If you "sandbox" it, then how is it going to "sort out your inbox"?

It might or might not help veteran devs accelerate some steps, but as with vibeclaw, there's essentially no way to use the tool without "sandboxing" it into uselessness. The pull requests for openclaw are 99% ai slop. There's still no major productivity growth engine in llm's.

  • I just gave it a dedicated `agent` user. So it's free to blow up its own files, but not mine.

    (Looked into the docker stuff and realized the only thing I actually cared about was it reading/writing my files and that Unix solved that problem like 60 years ago)

    I'm not hooking it up to my email, but I will probably give it its own account that I can forward stuff to.

    For most people I think the appropriate way to run it is on a Raspberry Pi (or mac mini, as the trend goes :)

    I realized I could fiddle with docker and have constant inconvenience and still stress about did I set it up right.. or just give it its own box (pi or VPS) for $5 and if it blows it up I just reset it.

    Having Claude as my sysadmin there is fun too. I obviously wouldn't use that for anything serious though. But in a year or two, that might not even be such a bad idea. At this point reliability is really the missing feature.

  • Yeah, it seems "sandboxing" is the current catch-all buzzword in AI products to hand-wave away any security concerns. Which often raises more questions than it answers for something like a generalist dev agent that has access to an endless number of tools/APIs/etc that could allow for a trivial bypass depending on the whims of the agent while problem solving.