Comment by saezbaldo

4 days ago

The thread illustrates a recurring pattern: encrypting the artifact instead of narrowing the authority.

An agent executing code in your environment has implicit access to anything that environment can reach at runtime. Encrypting .env moves the problem one print statement away.

The proxy approaches (Airut, OrcaBot) get closer because they move the trust boundary outside the agent's process. The agent holds a scoped reference that only resolves at a chokepoint you control.

But the real issue is what stephenr raised: why does the agent have ambient access at all? Usually because it inherited the developer's shell, env, and network. That's the actual problem. Not the file format.

The agent has ambient access because it makes it more capable.

For the same reasons we go to extreme measures to try to make dev environments identical with tooling like docker, and we work hard to ensure that there's consistency between environments like staging and production.

Viewing the "state of things" from the context of the user is much more valuable than viewing a "fog of war" minimal view with a lack of trust.

> Usually because it inherited the developer's shell, env, and network. That's the actual problem. Not the file format.

I'd argue this is folly. The actual problem is that the LLM behind the agent is running on someone else's computer, with zero accountability except the flimsy promise of legal contracts (at the best case - when backed by well funded legal departments working for large businesses).

This whole category of problems goes out of scope if the model is owned by you (or your company) and run on hardware owned by you (or your company).

If you want to fix things - argue for local.

  • Your local model is still going to get prompt-injected by third parties if it has an Internet connection. It just isn't regularly phoning home to Google/Anthropic/etc. but tons of other people would be interested in your data (or convincing the model to encrypt your home directory). There's also still no real accountability anywhere. Even if you have the resources to train the model from scratch yourself, it's not like you can audit the weights and understand any potential malicious behaviour encoded in there, beyond the baseline of "yeah these things are kinda unpredictable".

    And on the flip side, a remote model isn't creating risk in and of itself. That comes from the agent harness being permitted to make network and filesystem calls. Even the most evil possible version of ChatGPT isn't going to exfiltrate anything except by somehow social-engineering you into volunteering the information.

    • That's all true but it will fall before "[t]he agent has ambient access because it makes it more capable". Folks can shake their heads or worry or whatever, but feet are going to beat to where it is sweet. Users will follow capability.

      It's why people are hooking Open Claw up to stuff and letting it rip--putting it into a sandbox in a VM in a jail is like getting a brand new smartphone and setting it on Airplane Mode first thing.