Comment by zambelli
21 days ago
Not stupid at all!
Some of the older models did do this (like 3.5-era ish I think), and the harness would parse the results.
The newer way frontier has setup is structured tool calls. `tool_use` or `tool_calls`. The response is then received as a different tool_result rather than a regular message. That's a bit of the newer way of doing it.
The failure mode in question is more the model mixing the two: "Sure, I'll read the file: {"tool": "read", "args": {"path": "foo"}}" - that'll break stuff. Other failure modes are the json not parsing when sent it as a structured call, and in some cases the model just emitting text and forgetting the tool call.
No comments yet
Contribute on Hacker News ↗