Comment by andai

12 hours ago

I'm not sure how solvable it is. It only takes one screw up to ruin the reputation, and a screw up is basically guaranteed.

The tech has existed for a while but nobody sane wants to be the one who takes responsibility for shipping a version of this thing that's supposed to be actually solid.

Issues I saw with OpenClaw:

- reliability (mostly due to context mgmt), esp. memory, consistency. Probably solvable eventually

- costs, partly solvable with context mgmt, but the way people were using it was "run in the background and do work for me constantly" so it's basically maxing out your Claude sub (or paying hundreds a day), the economics don't work

- you basically had to use Claude to get decent results, hence the costs (this is better now and will improve with time)

- the "my AI agent runs in a sandboxed docker container but I gave it my Gmail password" situation... (The solution is don't do that, lol)

See also simonw's "lethal trifecta":

>private data, untrusted content, and external communication

https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

The trifecta (prompt injection) is sorta-kinda solved by the latest models from what I understood. (But maybe Pliny the liberator has a different opinion!)

2 comments

andai

feigewalnuss 8 hours ago

Disclosure: I wrote the linked post.

The "gave it my Gmail password" problem has a better answer than "don't do that." Security kicks itself out of the room when it only says no. Reserve the no for the worst days. The rest of the time, ship a better way.

That's why I built the platform to make credential leaks hard. It takes more than a single prompt. The credential vault is encrypted. Typed secret wrappers prevent accidental logging and serialization. Per-channel process isolation means a compromise in one adapter does not hand an attacker live sessions in the others.

"Don't do that" fails even for users trying their hardest. Good engineering makes mistakes hard and the right answer easy. Architecture carries the weight so the user does not have to.

On the trifecta being "sorta-kinda solved" by newer models, no. Model mitigations are a layer, not a substitute. Prompt injection has the shape of a confused-deputy problem and the answer to confused deputies has always been capabilities and isolation, not asking the already confused deputy to try harder.

You want the injection to fail EVEN when the model does not catch it.

andai 7 hours ago

Thanks. Yeah, I skipped that part in my comment, there are solutions for a lot of this stuff.
The one I see the most is brokers. Agent talks to a thing, thing has credential and does the task for the agent. Or proxies that magically inject tokens.
I think this only works for credentials though?
It doesn't solve the personal information part (e.g. your actual emails), right?
As for security, my solution was: keep it simple and limit blast radius.
Expect it to blow things up, and set things up so it doesn't matter when it happens.
I don't like docker so I just made a Linux user called agent. Agent can blow up all the files in its own homedir, and cannot read mine.
I felt really clever until I realized there's an even better solution: just give it a laptop (or Mac mini, or server, or whatever we're doing this week).
Same result but less pain in my ass. Switching users is annoying (and sharing files, and permission issues...). Also, worrying about which user I'm running stuff as... The thing just shouldn't be on my machine in the first place. It should have its own!
Functionally, its own Linux user or root on a $3 VPS are the same thing. It blows up the VPS, I just reset it.
For keys, I don't do anything fancy. It can leak all my keys. But if anyone steals them, they can exhaust my entire $5 prepaid balance ;) Blast radius limited.
But yeah, needs, tastes and preferences may differ.