← Back to context

Comment by purpleidea

9 hours ago

I want to like codex, but the quality is just not very good, especially when compared to Claude.

It used to work okay, but a while back they landed a major regression for an entire team of folks I work with.

No response, no workaround.

https://github.com/openai/codex/issues/23762

Decent sandbox + sandbox override experience with pi coding agent... pi-sandbox uses the same sandbox tech that claude code uses, although it uses a fork that's a little behind, and I'm not sure exactly why it uses a fork.

You can install pi, then install pi-sandbox locked to the current version. Here it is described how pi-sandbox plus an additional extension allow you to have the experience where a sandbox is used, but you can fall back to unsandboxed with approval required. https://github.com/carderne/pi-sandbox/issues/50

I don’t trust any agent to respect any boundaries. They might today. But tomorrow’s vibe coded slip update might break it in subtle ways.

My solution to this is to only run agents in a sandbox of my own making (a locked down Podman container).

  • They can't respect boundaries as long as those boundaries exist only in the LLM instruction set. A human being who follows rules long enough the rules will become second nature (usually), almost to the point where long running companies are known for having rules no one understands (Chesterton's Fence is alive and well).

    But an LLM have a limited "memory" and while the instructions might land in there and be of sufficient priority to be "respected" a single instance of that memory getting too full or the LLM autocompleting the work around because that was the statistical "best" solution and any barriers that exist only in LLM instructions and not in hardcoded guards will evaporate like so much morning fog.

  • I went the full virtual machine route. Just finished hardening the setup and firewalling it off my local network. Not perfect but it does make me feel much safer.