Comment by paulddraper

3 months ago

> No major models have any direct knowledge of MCP.

Claude and ChatGPT both support MCP, as does the OpenAI Agents SDK.

(If you mean the LLM itself, it is "known" at least as much as any other protocol. For whatever that means.)

19 comments

paulddraper

whoknowsidont 3 months ago

>it is "known" at least as much as any other protocol.

No. It is not. Please understand what the LLM's are doing. Claude nor ChatGPT nor any major model knows what MCP is.

They know how to function & tool call. They have zero trained data on MCP.

That is a factual statement, not an opinion.

Bockit 3 months ago
This is probably a semantics problem. You’re right. The models don’t know how to mcp. The harness they run in does though (Claude code, Claude desktop, etc), and dynamically exposes mcp tools as tool calls.
- whoknowsidont 3 months ago
  
  >dynamically exposes mcp tools as tool calls.
  It doesn't even do that. It's not magic.
- llbbdd 3 months ago
  
  HN loves inventing semantics problems around AI. It's gotten really, really annoying and I'm not sure the people doing it are even close to understanding it.
choilive 3 months ago
That is an easily falsifiable statement. If I ask ChatGPT or Claude what MCP is Model Context Protocol comes up, and furthermore it can clearly explain what MCP does. That seems unlikely to be a coincidental hallucination.
- whoknowsidont 3 months ago
  
  Training data =/= web search
  Both ChatGPT and Claude will perform web searches when you ask them a question, which the fact that you got this confused is ironically topical.
  But you're still misunderstanding the principle point because at some point these models will undoubtedly have access to that data and be trained on it.
  But they didn't need to be, because LLM function & tool calling is already trained on these models and MCP does not augment this functionality in any way.
  
  3 replies →
- cstrahan 3 months ago
  
  You're misinterpreting OP.
  OP is saying that the models have not been trained on particular MCP use, which is why MCP servers serve up tool descriptions, which are fed to the LLM just like any other text -- that is, these descriptions consume tokens and take up precious context.
  Here's a representative example, taken from a real world need I had a week ago. I want to port a code base from one language to another (ReasonML to TypeScript, for various reasons). I figure the best way to go about this would be to topologically sort the files by their dependencies, so I can start with porting files with absolutely zero imports, then port files where the only dependencies are on files I've already ported, and so on. Let's suppose I want to use Claude Code to help with this, just to make the choice of agent concrete.
  How should I go about this?
  The overhead of the MCP approach would be analogous to trying to cram all of the relevant files into the context, and asking Claude to sort them. Even if the context window is sufficient, that doesn't matter because I don't want Claude to "try its best" to give me the topological sort straight from its nondeterministic LLM "head".
  So what did I do?
  I gave it enough information about how to consult build metadata files to derive the dependency graph, and then had it write a Python script. The LLM is already trained on a massive corpus of Python code, so there's no need to spoon feed it "here's such and such standard library function", or "here's the basic Python syntax", etc -- it already "knows" that. No MCP tool descriptions required.
  And then Claude code spits out a script that, yes, I could have written myself, but it does it in maybe 1 minute total of my time. I can skim the script and make sure that it does exactly what it should be doing. Given that this is code, and not nondeterministic wishy washy LLM "reasoning", I know that the result is both deterministic and correct. The total token usage is tiny.
  If you look at what Anthropic and CloudFlare have to say on the matter (see https://www.anthropic.com/engineering/code-execution-with-mc... and https://blog.cloudflare.com/code-mode/), it's basically what I've described, but without explicitly telling the LLM to write a script / reviewing that script.
  If you have the LLM write code to interface with the world, it can leverage its training in that code, and the code itself will do what code does (precisely what it was configured to do), and the only tokens consumed will be the final result.
  MCP is incredibly wasteful and provides more opportunities for LLMs to make mistakes and/or get confused.
cookiengineer 3 months ago

> That is a factual statement,
I think most people, even most devs, don't actually know how crappy an MCP client is built, and that it's essentially an MITM approach and that the client sends the LLM on the other end a crappy pretext of what tools are mounted and how to call their methods in a JSON, and then tries to intelligently guess what response was a tool call.
And that intelligent guess is where it gets interesting for pentesting, because you cannot guess anything failsafe.
paulddraper 3 months ago
> They have zero trained data on MCP.
They have significant data trained on MCP.
> They know how to function & tool call.
Right. You can either use MCP to transmit those tool calls, or you can create some other interface.
- whoknowsidont 3 months ago
  
  >They have significant data trained on MCP.
  No they don't lol.
  
  3 replies →
numpad0 3 months ago
(pedantry)it's something humans are talking about a lot, so up-to-date models do know about it...
- whoknowsidont 3 months ago
  
  Most likely! It's hard to qualify which specific models and version I'm talking about because they're constantly being updated.
  But the point is that function & tool calling was already built in. If you take a model from before "MCP" was even referenced on the web it will still _PERFECTLY_ interact with not only other MCP servers and clients but any other API as well.