Comment by gertjandewilde

19 hours ago

We built a unified API with a large surface area and ran into a problem when building our MCP server: tool definitions alone burned 50,000+ tokens before the agent touched a single user message.

The fix that worked for us was giving agents a CLI instead. ~80 tokens in the system prompt, progressive discovery through --help, and permission enforcement baked into the binary rather than prompts.

The post covers the benchmarks (Scalekit's 75-run comparison showed 4-32x token overhead for MCP vs CLI), the architecture, and an honest section on where CLIs fall short (streaming, delegated auth, distribution).

How is progressive discovery not more expensive due to the increased number of steps?

  • > How is progressive discovery not more expensive due to the increased number of steps?

    Why not run the discovery (whether MCP or CLI) in a subagent that returns only the relevant tools. I mean, discovery can be done on a local model, right?

  • I assume because the discovery is branching. If the an agent using the CLI for for GitHub needs to make an issue, it can check the help message for the issue sub-command and go from there, doesn't need to know anything about pull requests, or pipelines, or account configuration, etc, so it doesn't query those subcommands.

    Compare this to an MCP, where my understanding is that the entire API usage is injected into the context.