Comment by _flux
11 hours ago
Not having access to the shell is a big hindrance. I have my agent access Gitlab and Jira via CLI tools and in so many cases jq or python is used to manipulate or combine the data into a more useful format, making use of pipes and temporary files. You can of course limit what an agent can do, most easily by not giving it access to things it shouldn't do. I suppose there are no existing easy gateway methods to grant fine-grained OS-level permissions to add such things back, except perhaps `sudo` and similar tools.
MCPs are impossible to combine this way: everything you feed or get from them goes though the model and consumes tokens.
You’re right that having a shell is the ultimate tool, and an agent with a shell seems to perform better than one without one. But, making shells safe is really damn hard; e.g. in the context of running an agent on behalf of a SaaS customer in your AWS environment. For now some companies are accepting the performance/security tradeoff of disabling the shell and focusing on specialized tools.
Remember: jq can always be a tool (MCP or otherwise). In this way you can allowlist specific CLI programs and give them to the agent via tools. Making python a tool is more difficult; that would have all of the same RCE injection issues that the shell would have.
There are isolation stacks that help make “running an agent with a shell on behalf of a customer in the cloud” possible. It’s just very risky. There’s a thousand attack vectors, and to a very real degree companies that are getting to this point are re-thinking their cloud infrastructure and architecture from first principals.
jq cannot be just an MCP, unless it's acceptable that yuo pass all data through the context. If that's not acceptable and you want to have it as tool, then you need some other way to handle the data.
I think the basic solution to this is to have a "static shell" but with modern tools for the agents, not actually executing other binaries. It could have things like jq, curl, piping and redirection to/from session files. Maybe even Python if it can be made safe. If not, then there are a lot of languages can be.
Can an MCP provide prompts for your model to download and use CLIs (and ensure they have the right versions of those tools) in such a way that the data flows through the client side tools?
The more I read this thread the more I'm convinced that the main value of MCP is to provide a server managed release process. This is the same advantage that SaaS has over client side apps.
However MCPs couples with a client willing to run tools locally can provide the best of both worlds
As far as I know, the only way an MCP can provide you data that doesn't go into the context is by providing URLs to the data, and then the model uses e.g. curl to access that data for data manipulation purposes. You could also return result set ids and provide accessors to such data, but ultimately you'd need to provide powerful accessors to that result set to avoid polluting context. Things like shell with all its power already provides.
It seems like there's little point in MCP in that case. Maybe more point if it was a standard mechanism for MCP to provide additional data, in a completely compatible fashion with all other tools? You could perhaps even pass such URLs to other MCPs. You could have an MCP for jq for doing stream processing. Starts to look a lot like a shell, though.
Seems like MCPs could also be extended to store auxiliary data to your memory (or filesystem..), and then an additional extension to provide that kind of data as auxiliary data in the calls to MCP.
Well, even as is, MCP still provides a standard method of using OAuth for accessing such services. And you must use MCP if you wish to add something to the ChatGPT.com web service, so it's easy to see why OpenAI folks are seeing companies going that way.
>to manipulate or combine the data into a more useful format
why not build this directly into MCPs?
Hmm, indeed, so maybe I could have all this as an MCP, so I can just easily pass any imaginable data manipulation inside it, and then also have it support calling other MCPs, all inside that one MCP, to avoid filling context with intermediate data..
Sounds a lot like a shell to me.