Same for MCP - there is always a chance an agent will mess up the tool use.
This kind of LLM’s non-determinism is something you have to live with. And it’s the reason why I personally think the whole agents thing is way over-hyped - who need systems that only work 2 times out of 3, lol.
Something like https://github.com/huggingface/smolagents
Needs a sandbox, otherwise blindly executing generated code is not acceptable
https://www.anthropic.com/engineering/advanced-tool-use#:~:t...
Anthropic themselves support this style of tool calling with code first party now too.
Yup, that’s I’ve been taking about.
Cloudflare published this article which I guess can be relevant https://blog.cloudflare.com/code-mode/
this assumes generated code is always correct and does exactly what's needed.
Same for MCP - there is always a chance an agent will mess up the tool use.
This kind of LLM’s non-determinism is something you have to live with. And it’s the reason why I personally think the whole agents thing is way over-hyped - who need systems that only work 2 times out of 3, lol.
The fraction is a lot higher than 2/3 and tool calls are how you give it useful determinism.
2 replies →