Comment by zambelli

21 days ago

Not stupid at all!

Some of the older models did do this (like 3.5-era ish I think), and the harness would parse the results.

The newer way frontier has setup is structured tool calls. `tool_use` or `tool_calls`. The response is then received as a different tool_result rather than a regular message. That's a bit of the newer way of doing it.

The failure mode in question is more the model mixing the two: "Sure, I'll read the file: {"tool": "read", "args": {"path": "foo"}}" - that'll break stuff. Other failure modes are the json not parsing when sent it as a structured call, and in some cases the model just emitting text and forgetting the tool call.