← Back to context

Comment by toraway

3 days ago

Yep, I see both Codex and Opus routinely circumvent security restrictions without skipping a beat (or bothering to ask for permission/clarification).

Usually after a brief, extremely half-hearted ethical self-debate that ends with "Yes doing Y is explicitly disallowed by AGENTS.md and enforced by security policy but the user asked for X which could require Y. Therefore, writing a one-off Python script to bypass terminal restrictions to get this key I need is fine... probably".

The primary motivating factor by far for these CLI agents always seems to be expedience in completing the task (to a plausible definition of "completed" that justifies ending the turn and returning to the user ASAP).

So a security/ethics alignment grey area becomes an insignificant factor to weigh vs the alternative risk of slowing down or preventing completion of the task.

> The primary motivating factor by far for these CLI agents always seems to be expedience in completing the task (to a plausible definition of "completed" that justifies ending the turn and returning to the user ASAP).

Curiously enough, step one of becoming a good system operator is to learn how to do things. Step two is learning when not to do things and how to deal with a user trying to force you to do things. And step three is learning how to do things you should not do, just very carefully. It can be a confusing job.

But that's why any kind of AI agent stays very far away from any important production access. People banging configs in uncontrolled ways until something beneficial happens is enough of a problem already.

Honestly, I think this the correct behavior.

If it's technically possible for an agent to circumvent a security policy, it should.

Telling it not do something via AGENTS.md was never secure. This is just an expedient way of pointing out all the flaws in your setup. And if it's not even doing it for nefarious reasons, just trying to do what you asked of it, I think it's fair.

I've even found it genuinely helpful. I've sandboxed my Codex so it can't run certain things. Things I'd actually like it to run but I've restricted it too much, so it finds clever ways of doing it anyway.

  • I just gave it its own user, and run it (and all AIs) in yolo mode.

    So they are free to nuke themselves and each other, but cannot touch my files.

    For most people I tell them to just get a dedicated device, which is less annoying and (I think?) more secure. Like you can literally give it root on a $3 VPS and what's the worst case scenario? It bricks itself and you reset the VPS? (Or installs crypto miners, but I think it can do that without root :)

    My favorite option for a dedicated agent device so far is the $50 thinkpad, which gets you rpi-ish price, better performance, and the screen and keyboard included.