← Back to context

Comment by tptacek

13 hours ago

There are other models. Eschew the sandbox. Give the agent a computer, with all the trimmings, but keep that computer segregated from sensitive resources. Tokens are a solved problem: tokenize them[1] or do something equivalent with a proxy. The same thing goes for secrets.

A lot of this post presents false dichotomies. It assumes the existence of a sandbox that is by definition ephemeral or "cattle-like". Why? There are reasons to do that and reasons not to do that. You can have a durable computer with a network identity and full connectivity, and you can have that computer spin down and stop billing when not in use.

There are a zillion different shapes for addressing these problems, and I'm twitchy because I think people are super path-dependent right now, and it's causing them to miss a lot of valuable options.

[1]: https://fly.io/blog/tokenized-tokens/ (I work at Fly.io but the thing this post talks about is open source).

I'd argue you are still using a sandbox, just at a higher ring (outside the machine/VM) and relaying on app/resource level permissions on each of your external resources to enforce it, which requires _all_ of those external systems to be hardened vs. the agent host itself. The capabilities a full machine has for exploring and exploiting external, ostensibly secured systems, has already been touched on via incidents like the anthropic internal model jailbreak. [0]

Giving the whole machine also doesn't answer the question for how the agent can hook into actions that eventually require more perms, and even if you "airgap" those via things like output queues that humans need to approve, that still feels "harnessey" to me.

I feel a bit guilty of debating semantics here, especially as I can't/don't intend to convey any confidence in a "right answer", but my reason for being pedantic is that I do think there are interesting tradeoffs between "P(jailbreak or unexpected capability use|time)" and "increasing power/available capability set", as well as interesting primitives emerging in terms of the components you'd need regardless of where you drew that line (ala paragraph 2.)

[0] - https://www-cdn.anthropic.com/3edfc1a7f947aa81841cf88305cb51... (specifically section 5.5.2.4)

  • The post is explicit about what they mean by sandboxing and what the tradeoffs are for the model they're discussing.

    • I've reread it and I stand by my statements that it's an isomorphism, simply replace "container" with "machine AAD/auth-system boundaries" in your example.

      The "Your credentials stay out of the sandbox" problem, to quote them, is what I see your "require your perms system to enforce it" as implicitly solving for.

      (Their "sandbox as cattle" discussion had less bearing on the "which pattern" question to me, since I tend to treat most parts of my agent stack as cattle-like, potentially out of a bias towards that architecture broadly, as I find it's much easier to reason about when as much as possible is disposable/idempotent/eventually consistent. The durable execution point also assumed aspects of the agent scaffold ala prompts don't have to be turned over in deploy, or conversely, can't finish their tasks and then migrate incrementally, and while I might cynically raise an eyebrow at the focus on 25ms for sandbox calls given the dev loops I currently experience, I'd argue there are other ways to solve that problem in both an in or outside of container sandbox pattern.)

      I'd even agree with their final point "Consistency is the part we haven't answered" but in a different angle than they intended, as to why my focus was on "how do you _constrain_ agent behavior" since that has been, in my experience, the biggest bottleneck to letting agents do more.

> It assumes the existence of a sandbox that is by definition ephemeral or "cattle-like". Why?

Because the moment you use k8s, you have to assume that, apparently. Or so Im told by all the infrastructure people I speak with. Getting these pods to not disappear just because one process ran out of memory has been an herculean task.

I wish our standard deploy processes produce durable computers that dont break our bank but that hasn't been an easy requirement with simple infra teams.

Author here.

This is an interesting and novel field, so I’m not pretending I know the answers, but this is what worked for us :)

At the end of the day, and oversimplifying things: why would I want to spawn a for loop that calls an API (LLM) into its own dedicated sandbox/computer?

When the model wants to run a command, it’ll tell you so. Doesn’t need to be a local exec, you can run it anywhere, the model won’t know the difference.

The agent loop itself doesn’t need sandboxing. In many cases, most tool calls don’t require sandboxing either. For the tools that do require a computer, you can route those requests there when needed, rather than running the whole software in that sandbox.

To me running the agent loop in the sandbox itself feels like “you should run your API in your DB container because it’ll talk to it at some point”.

I'm also very excited by the different shapes for solving problems in this space. A little worried that the path dependence is ACTUALLY a bit warranted since "popular harness engineering is just claude-wrapping" is a bit of a self-fulfilling prophecy today.

I've heard many claims that because LLMs are tuned to specific harnesses, we should expect worse performance with novel architectures. That seems to make people reluctant to try to put effort into inventing them.

  • Author here.

    I’m worried about the same (models tuned for specific harnesses).

    We actually work around that by respecting the “contract”. For instance, our harness’ Bash signature is exactly the same as Claude’s. We do our sandboxing stuff and respond using the same format.

    In the “eyes” of the model there’s no difference between what Claude does and what we do (even though the implementation is completely different).

    We basically use Claude’s tools as API contract

Wow, thanks for writing this up.

I'm building an agent sandboxing system for a client atm, and was about to start working on a system of ephemeral, short lived, derived secrets for the agent to use.

Lots of great thoughts to steal in this piece. Thanks again.

I agree with the argument that there are many more than two ways to do this. When I built my AI assistant (https://stavrobot.stavros.io/), for example, I implemented an architecture that has both the ways detailed in the post. The harness runs simultaneously both inside and outside the container (I didn't want the harness to touch the system, and I didn't want LLM-generated code to touch the harness).

It's all tradeoffs, and picking the ones that work for what you want to do is what architecture is. The more informed you are about the tradeoffs, the better you can make your architecture.