← Back to context

Comment by ewild

16 hours ago

I feel like I don't fully understand mcp. I've done research on it but I definitely couldn't explain it. I get lost on the fact that to my knowledge it's a server with API endpoints that are well defined into a json schema then sent the to LLM and the LLM parses that and decides which endpoints to hit (I'm aware some llms use smart calling now so they load the tool name and description but nothing else until it's called). How exactly are you doing the process of stopping the LLM from using web search after it hits a certain endpoint in your MCP server? Or is this referring strictly to when you own the whole workflow where you can then deny websearch capabilities on the next LLM step?

Are there any good docs youve liked to learn about it, or good open source projects you used to get familiar? I would like to learn more

You need to go back to LLM tools. Before MCP, you could write tools for your LLM to use by normally using Python, something like this:

    @tool def do_great_thing(arg: string) -> string:
        // todo

The LLM now understands that to do the great thing, it can just call this function and get some result back that - which it will use to answer some query from the user.

Notice that the tool uses structured inputs/outputs (the types - they can also be "dictionaries", or objects in most languages - giving the LLM powerful capabilities).

Now, imagine you want to write this in any language. What do you do?

Normally, you create some sort of API for that. Something like good old RPC. Which is essentially what MCP does: it defines a JSON-RPC API for tools, but it also adds some useful stuff, like access to static resources, elicitation (ask user for input outside of the LLM's chat) and since the MCP auth spec, an unified authorization system based on OAuth. This gives you a lot of advantages over a CLI, as well as some disadvantages. Both make sense to use. For example, for web usage, you just want the LLM to call Curl! No point making that a MCP server (except perhaps if you want to authorize access to URLs?). However, if you have an API that exposes a lot of stuff (e.g. JIRA) you definitely want a MCP for that. Not only does it get only the access you want to give the LLM instead of using your own credentials directly, now you can have a company wide policy for what can be done by agents when accessing your JIRA (or whatever) system.

A big disadvantage of MCP is that all the metadata to declare the RPC API take a lot of context, but recently agents are smart about that and load that partially and lazily as required, which should fix the problem.

In summary: whatever you do, you'll end up with something like MCP once you introduce "enterprise" users and not just yolo kids giving the LLM access to their browsers with their real credentials and unfiltered access to all their passwords.

  • For my requirements, over 90% of the LLM integrations and rollouts have it exactly backwards. The only thing you want these agents doing is building modular, testable traditional CLI tools which can then be scripted as easily by a human or agent with almost no context/learning required. Humans must distill the probabalism of agent output into composable deterministic functions.

    Pushing opaque probabalistic black boxes into the execution of your day to day operations, communications, whatever it is, is horrible even if it works. At best it’s a pyrrhic victory. I see startups using these agents to mitigate healthcare disputes.

    There’s no such thing as a domain that resists modeling but for which you could accept a probabilistic result. Probabilistic must also mean probabilistically acceptable. We have words for the only counter examples: drafting, brainstorming, maybe triage.

There is not a lot to learn to understand the basics, but maybe one step that's not necessarily documented is the overall workflow and why it's arranged this way. You mentioned the LLM "using web search" and it's a related idea: LLMs don't run web searches themselves when you're using an MCP client, they ask the client to do it.

You can think of an MCP server as a process exposing some tools. It runs on your machine communicating via stdin/stdout, or on a server over HTTP. It exposes a list of tools, each tool has a name and named+typed parameters, just like a list of functions in a program. When you "add" an MCP server to Claude Code or any other client, you simply tell this client app on your machine about this list of tools and it will include this list in its requests to the LLM alongside your prompt.

When the LLM receives your prompt and decides that one of the tools listed alongside would be helpful to answer you, it doesn't return a regular response to your client but a "tool call" message saying: "call <this tool> with <these parameters>". Your client does this, and sends back the tool call result to the LLM, which will take this into account to respond to your prompt.

That's pretty much all there is to it: LLMs can't connect to your email or your GitHub account or anything else; your local apps can. MCP is just a way for LLMs to ask clients to call tools and provide the response.

1. You: {message: "hey Claude, how many PRs are open on my GitHub repo foo/bar?", tools: [... github__pr_list(org:string, repo:string) -> [PullRequest], ...] } 2. Anthropic API: {tool_use: {id: 123, name: github__pr_list, input:{org: foo, repo: bar}}} 3. You: {tool_result: {id: 123, content: [list of PRs in JSON]} } 4. Anthropic API: {message: "I see 3 PRs in your repo foo/bar"}

that's it.

If you want to go deeper the MCP website[1] is relatively accessible, although you definitely don't need to know all the details of the protocol to use MCP. If all you need is to use MCP servers and not blow up your context with a massive list of tools that are included with each prompt, I don't think you need to know much more than what I described above.

[1] https://modelcontextprotocol.io/docs/learn/architecture

  • Maybe it's because of the example, but if the LLM knows the GitHub CLI and I bet it knows it, shouldn't it be able to run the commands (or type them for us) to count the open PRs on foo/bar?

    However I see the potential problem of the LLM not knowing an obscure proprietary API. The traditional solution has been writing documentation, maybe on a popular platform like Postman. In that case the URL of the documentation could be enough, or an export in JSON. It usually contains examples too. I dread having to write and maintain both the documentation for humans and the MCP server for bots.

    • It can and it does especially combined with skills (context files). It can hit REST APIs with CURL just fine. MCP is basically just another standard.

      Where it comes in handy has mostly been in distribution honestly. There's something very "open apis web era" about MCP servers where because every company rushed to publish them, you can write a lot of creative integrations a bit more easily.

LLM is not doing the work.. your code is doing the work, LLM is just telling you which of the functions (aka tools) you should run.

web search is also another tool and you can gate it with logic so LLMs don’t go rogue.

that’s kinda simplest explanation i guess

  • Ok so in a situation like regular orchestration you would essentially layout all possible steps the LLM can take in your code in a big orchestration layer, and if it hits the sensitive endpoint the orchestration that can occur past that will block off web search. In the design that is. But for something like a manus style agent where you're outsourcing all the work but allowing it to hit your MCP it just becomes a regular API the LLM can call