Comment by yencabulator
19 hours ago
> computer nerds understand
No, literally no one understands how to solve this. The only option that actually works is to isolate it to a degree that removes the "clawness" from it, and that's the opposite of what people are doing with these things.
Specifically, you cannot guard an LLM with another LLM.
The only thing I've seen with any realism to it is the variables, capabilities and taint tracking in CaMeL, but again that limits what the system can do and requires elaborate configuration. And you can't trust a tainted LLM to configure itself.
https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
https://simonwillison.net/2025/Jun/13/prompt-injection-desig...
If the “clawness” means you only use the llm to control itself, then yes, that’s impossible. But you can easily shim such a process so that the interfaces it uses to “claw out” to the real world are shims that have safeties such as human control. Openclaw does not do this, and is thus a scary shit show, but you can play with it in isolation safely, and I think a standard pattern for good control will emerge.
> easily
Yeah that's an active research topic for teams of PhDs, including some of Google's brightest. And the current approach even with added barriers may just be fundamentally untrustable. Read the links from my earlier comment for background.
If the shim doesn’t use an LLM to make its decisions this is not a problem.
If the shim does use an LLM but no uncontrolled data is allowed in, this is not a problem.
2 replies →