← Back to context

Comment by floppyd

1 day ago

> This is the key insight: we’re just telling the LLM “here are your tools, here’s the format to call them.” The LLM figures out when and how to use them.

This really blew my mind back then in the ancient times of 2024-ish. I remember the idea of agents just reached me and I started reading various "here I built an agent that does this" articles, and I was really frustrated at not understanding how the hell LLM "knows" how to call a tool, it's a program, but LLMs just produce text! Yes I see you are telling LLM about tools, but what's next? And then when I finally understood that there's no next, no need to do anything other than explaining — it felt pretty magical, not gonna lie.

Tools documentation is in text, either directly e.g. tool -h or indirectly e.g. man tool plus they are countless examples of usage online, so it seems clear that a textual mapping between the tool and its usage exists in text form already.

Running a command in a shell is a string of text. LLMs produce text and it's easy to write a program that then executes that text as a process. I don't see what's magical about it at all.

It would seem magical if you think of LLMs as token predictors with zero understanding of what they're doing. This is evidence that there's more going on though.