Comment by 827a

16 hours ago

Yes, in the same way a programming language would be worse off if they focused all of their effort on building an implementation instead of a language specification.

You could literally, deterministically, zero AI, code-gen a CLI from an MCP specification, just like you can with an OpenAPI specification. I'm sure tools exist that do this. So if you want a CLI, there it is.

But the problem with a CLI is that it requires a shell environment, and not everywhere you may want to run an agent should or can have access to a shell. MCP enables the harness to tightly control that access. MCP allows the user to easily allowlist/denylist specific tools, or categorize tools into "ask me every time" versus "don't ask me just do it". Doing any of this with a CLI is really hard because CLIs are all very different; yes, AIs can easily learn how to use them, but that might be exactly what you don't want, hey AI don't issue that aws ec2 delete-instance command ah crap there it goes I wish I could have just denylisted its access to that tool.

Not having access to the shell is a big hindrance. I have my agent access Gitlab and Jira via CLI tools and in so many cases jq or python is used to manipulate or combine the data into a more useful format, making use of pipes and temporary files. You can of course limit what an agent can do, most easily by not giving it access to things it shouldn't do. I suppose there are no existing easy gateway methods to grant fine-grained OS-level permissions to add such things back, except perhaps `sudo` and similar tools.

MCPs are impossible to combine this way: everything you feed or get from them goes though the model and consumes tokens.

  • You’re right that having a shell is the ultimate tool, and an agent with a shell seems to perform better than one without one. But, making shells safe is really damn hard; e.g. in the context of running an agent on behalf of a SaaS customer in your AWS environment. For now some companies are accepting the performance/security tradeoff of disabling the shell and focusing on specialized tools.

    Remember: jq can always be a tool (MCP or otherwise). In this way you can allowlist specific CLI programs and give them to the agent via tools. Making python a tool is more difficult; that would have all of the same RCE injection issues that the shell would have.

    There are isolation stacks that help make “running an agent with a shell on behalf of a customer in the cloud” possible. It’s just very risky. There’s a thousand attack vectors, and to a very real degree companies that are getting to this point are re-thinking their cloud infrastructure and architecture from first principals.

    • jq cannot be just an MCP, unless it's acceptable that yuo pass all data through the context. If that's not acceptable and you want to have it as tool, then you need some other way to handle the data.

      I think the basic solution to this is to have a "static shell" but with modern tools for the agents, not actually executing other binaries. It could have things like jq, curl, piping and redirection to/from session files. Maybe even Python if it can be made safe. If not, then there are a lot of languages can be.

  • Can an MCP provide prompts for your model to download and use CLIs (and ensure they have the right versions of those tools) in such a way that the data flows through the client side tools?

    The more I read this thread the more I'm convinced that the main value of MCP is to provide a server managed release process. This is the same advantage that SaaS has over client side apps.

    However MCPs couples with a client willing to run tools locally can provide the best of both worlds

    • As far as I know, the only way an MCP can provide you data that doesn't go into the context is by providing URLs to the data, and then the model uses e.g. curl to access that data for data manipulation purposes. You could also return result set ids and provide accessors to such data, but ultimately you'd need to provide powerful accessors to that result set to avoid polluting context. Things like shell with all its power already provides.

      It seems like there's little point in MCP in that case. Maybe more point if it was a standard mechanism for MCP to provide additional data, in a completely compatible fashion with all other tools? You could perhaps even pass such URLs to other MCPs. You could have an MCP for jq for doing stream processing. Starts to look a lot like a shell, though.

      Seems like MCPs could also be extended to store auxiliary data to your memory (or filesystem..), and then an additional extension to provide that kind of data as auxiliary data in the calls to MCP.

      Well, even as is, MCP still provides a standard method of using OAuth for accessing such services. And you must use MCP if you wish to add something to the ChatGPT.com web service, so it's easy to see why OpenAI folks are seeing companies going that way.

  • >to manipulate or combine the data into a more useful format

    why not build this directly into MCPs?

    • Hmm, indeed, so maybe I could have all this as an MCP, so I can just easily pass any imaginable data manipulation inside it, and then also have it support calling other MCPs, all inside that one MCP, to avoid filling context with intermediate data..

      Sounds a lot like a shell to me.

You prevent the LLM from deleting your instances by not granting its AWS user that permission. Whatever tool you let it use to talk to AWS is irrelevant.

So the permissions model h is a the main advantage MCP has over CLIs?

  • Is that so surprising? I thought that was a given. And as soon as remote resources are involved, the old "it's in a docker" peace of mind does not apply.