Comment by Twirrim

6 days ago

I'm using an MCP to enhance my security posture. I have tools with commands that I explicitly cannot risk the agent executing.

So I run the agent in a VM (it's faster, which I find concerning), and run an MCP on the host that the guest can access, with the MCP also only containing commands that I'm okay with the agent deciding to run.

Despite my previous efforts with skills, I've found agents will still do things like call help on CLIs and find commands that it must never call. By the delights of the way the probabilities are influenced by prompts, explicitly telling it not to run specific commands increases the risk that it will (because any words in the context memory are more likely to be returned).

0 comments

Twirrim

No comments yet

Contribute on Hacker News ↗