Comment by jmward01

5 days ago

The Programmatic Tool Calling has been an obvious next step for a while. It is clear we are heading towards code as a language for LLMs so defining that language is very important. But I'm not convinced of tool search. Good context engineering leaves the tools you will need so adding a search if you are going to use all of them is just more overhead. What is needed is a more compact tool definition language like, I don't know, every programming language ever in how they define functions. We also need objects (which hopefully Programatic Tool Calling solves or the next version will solve). In the end I want to drop objects into context with exposed methods and it knows the type and what is callable on they type.

Why exactly do we need a new language? The agents I write get access to a subset of the Python SDK (i.e. non-destructive), packages, and custom functions. All this ceremony around tools and pseudo-RPC seems pointless given LLMs are extremely capable of assembling code by themselves.

  • I'm imagining something more like Rexx with quite high level commands. But that certainly blurs the line between programming language and shell.

    The reason for choosing higher level constructs is token use. We certainly reduce the number of tokens by using a shell like command language, But of course that also reduces expressiveness.

    I've been meaning to get round to Plan 9 style where the LLM reads and writes from files rather than running commands. I'm not sure whether that's going to be more useful than just running commands. is for an end user because they only have to think about one paradigm - reading/writing files.

    • Between this:

      > In the end I want to drop objects into context with exposed methods and it knows the type and what is callable on they type.

      And this:

      > I'm imagining something more like Rexx with quite high level commands. But that certainly blurs the line between programming language and shell.

      It really sounds like you're trying to reinvent powershell.

      A shell A Scripting language Everything is a self describing object Piped on the shell With exposed methods to call

  • Does this "non destructive subset of python SDK" exist today, without needing to bring, say, a whole webassembly runtime?

    I am hoping something like CEL (with verifiable runtime guarantees) but the syntax being a subset of Python.

  • Woah woah woah, you’re ignoring a whole revenue stream caused by deliberately complicating the ecosystem, and then selling tools and consulting to “make it simpler”!

    Think of all the new yachts our mega-rich tech-bros could have by doing this!

    • Tool search is formalising what a lot of teams have been working towards. I had previously called it tool caller, the LLM knew there was tools for domains and then when the domain was mentioned, the tools for the domain would be loaded, this looks a bit smarter.

Exactly, instead of this mess, you could just give it something like .d.ts.

Easy to maintain, test etc. - like any other library/code.

You want structure? Just export * as Foo from '@foo/foo' and let it read .d.ts for '@foo/foo' if it needs to.

But wait, it's also good at writing code. Give it write access to it then.

Now it can talk to sql server, grpc, graphql, rest, jsonrpc over websocket, or whatever ie. your usb.

If it needs some tool, it can import or write it itself.

Next realisation may be that jupyter/pluto/mathematica/observable but more book-like ai<->human interaction platform works best for communication itself (too much raw text, I'd take you days to comprehend what it spit out in 5 minutes - better to have summary pictures, interactive charts, whatever).

With voice-to-text because poking at flat squares in all of this feels primitive.

For improved performance you can peer it with other sessions (within your team, or global/public) - surely others solved similar problems to yours where you can grab ready solutions.

It already has ablity to create tool that copies itself and can talk to a copy so it's fair to call this system "skynet".

The latest MCP specifications (2025-06-18+) introduced crucial enhancements like support for Structured Content and the Output Schema.

Smolagents makes use of this and handles tool output as objects (e.g. dict). Is this what you are thinking about?

Details in a blog post here: https://huggingface.co/blog/llchahn/ai-agents-output-schema

  • We just need simple language syntax like python and for models to be trained on it (which they already mostly are):

    class MyClass(SomeOtherClass):

      def my_func(a:str, b:int) -> int: 
    
        #Put the description (if needed) in the body for the llm.
    

    That is way more compact than the json schema out there. Then you can have 'available objects' listed like: o1 (MyClass), o2 (SomeOtherClass) as the starting context. Combine this with programatic tool calling and there you go. Much much more compact. Binds well to actual code and very flexible. This is the obvious direction things are going. I just wish Anthropic and OpenAI would realize it and define it/train models to it sooner rather than later.

    edit: I should also add that inline response should be part of this too: The model should be able to do ```<code here>``` and keep executing with only blocking calls requiring it to stop generating until the block frees up. so, for instance, the model could ```r = start_task(some task)``` generate other things ```print(r.value())``` (probably with various awaits and the like here but you all get the point).

I specifically built this as an MCP server. It works like an MCP server that proxies to other MCP servers and converts the tool defintions in to typescript anotations and asks your llm to generate typescript that runs in a restricted VM to make tools calls that way. It's based on the apple white paper on this topic from last year. https://github.com/zbowling/mcpcodeserver

I'm not sure that we need a new language so much as just primitives from AI gamedev, like behavior trees along with the core agentic loop.

  • After implementing a behaviour tree library and realising the power of select & sequence I found myself wondering why they aren’t used more widely.

    I’ve never done anything in crypto but watched in horror as people created immutable contracts with essentially Javascript programs. Surely it would be much easier to reason about/verify scripts written as a behaviour tree with a library of queries and actions. Even being able to limit the scope of modifications would be a win.

Reminds me a bit of the problem that GraphQL solves for the frontend, which avoids a lot of round-trips between client and server and enables more processing to be done on the server before returning the result.

Giving the AI an actual programming language (functions + objects) genuinely does seem like a good alternative to the MCP mess we have right now.