Comment by madrox

17 hours ago

I've long held the current agent permission model is like playing a game of "Papers, Please" and most permission models engineers implement in their own AI products is more a measure of how trusting the user is with AI than an actual permission check.

I'm of the view that future controls should be more about approving plans and rewinding durable workflows as models get better at avoiding egregious mistakes.

the models will never avoid egregious behavior. think of it like every "good intentions" morality tale. theres almost always some geniune context where that behavior is wanted.

instead, the coding harness or determinative tool, will need hardcoded security features.

in opencode, almost all the power comes from bash and all other permissions are just chrades. its powerful and insecure because of it.

you can sand box them but then you fight the sandbox to pipe in your assets. the sandbox becomes porous because elsewise its useless.

MCPs dont address much either.

want we are looking for is a portal or protocol that has the model and harness and the actions tunneled, like ssh, to some fixed scoped and limited shell along side the assets.

then, the user and LLM can the negotiate assets and actions as needed via the protocol.

but alas, as your comment suggests, people thing theres some perfect context thatll prevent bad things from happening. the libertarian paradise without regulation.

  • I think you're choosing to ignore what I said about the implication of durable workflows, because you seem to be inventing some stories about my comment.

    I find that well documented plans do pretty well at aligning AI to what I want it to do, and if it does go astray, as you rightly point out it can still do, it would be sufficient if I can undo it with little pain. We do this kind of thing all the time in CI/CD pipelines.

    Even humans can take down production. We have all kinds of guards in place to empower while also defending against the intern accidentally dropping the DB.