Comment by cstrahan
3 months ago
The LLM output differentiates between text output intended for the user to see, vs tool usage.
You might be thinking "but I've never seen any sort of metadata in textual output from LLMs, so how does the client/agent know?"
To which I will ask: when you loaded this page in your browser, did you see any HTML tags, CSS, etc? No. But that's only because your browser read the HTML rendered the page, hiding the markup from you.
Similarly, what the LLM generates looks quite different compared to what you'll see in typical, interactive usage.
See for example: https://platform.openai.com/docs/guides/function-calling
The LLM might generate something like this for text:
{
"content": [
{
"type": "text",
"text": "Hello there!"
}
],
"role": "assistant",
"stop_reason": "end_turn"
}
Or this for a tool call:
{
"content": [
{
"type": "tool_use",
"id": "toolu_abc123",
"name": "get_current_weather",
"input": {
"location": "Boston, MA"
}
}
],
"role": "assistant",
"stop_reason": "tool_use"
}
The schema is enforced much like end-user visible structured outputs work -- if you're not familiar, many services will let you constrain the output to validate against a given schema. See for example:
No comments yet
Contribute on Hacker News ↗