Comment by wassel

13 days ago

I think a lot of teams realize “agent sandboxing” isn’t just isolation, it’s about making long-running agent work actually converge.

In practice, agents don’t fail only because the model is wrong. They fail because the environment is flaky: missing deps, slow setup, weird state, unclear feedback loops. If you give an agent an isolated, secure environment that’s already set up for the repo, you remove a ton of friction and iterations become much more reliable.

The other piece is “authority” / standards. You can write guidelines, but what keeps agents (and humans) aligned is the feedback: tests, linters, CI rules, repo checks. Centralizing those standards and giving the agent a clean place to run them makes compliance much more deterministic.

We built this internally for our own agent workflows and we’re debating whether it’s worth offering the sandbox part as a standalone service (https://envs.umans.ai), because it feels like the part everyone ends up rebuilding.

3 comments

wassel

jacobgadek 13 days ago

The "token and time sink" point is huge. I've found that even when agents can install deps, they often get stuck in reasoning loops trying to fix a "build toolchain issue" that is actually just a hallucinated package name.

I built a local runtime supervisor (Vallignus) specifically to catch these non-converging loops. It wraps the agent process to enforce egress filtering (blocking those random pip installs) and hard execution limits so they don't burn $10 retrying a fail state.

It's effectively a "process firewall" for the agentic workflow. Open source if you want to see the implementation: https://github.com/jacobgadek/vallignus

ATechGuy 13 days ago

> They fail because the environment is flaky: missing deps, slow setup, weird state, unclear feedback loops.

Why can't agents install missing deps based on the error message?

wassel 13 days ago

They often try, but two things bite in practice:
- Permissions and sandbox limits. Many agents don’t run on a dev’s laptop with admin access They run in the cloud or in locked down sandboxes: no sudo, restricted filesystem, restricted network egress. So “just install it” is sometimes not allowed or not even possible.
- It is a token and time sink and easy to go down the wrong path. Dependency errors are noisy: missing system libs, wrong versions, build toolchain issues, platform quirks. Agents can spend a lot of iterations trying fixes that don’t apply, or that create new mismatches.
Repo ready environments don’t replace agents installing deps. They just reduce how often they have to guess.