Comment by Dansvidania

21 days ago

sorry if it's a stupid question, but isn't generating valid json tool call in the middle of prose the way tool calling works? what is that missing?

Not stupid at all!

Some of the older models did do this (like 3.5-era ish I think), and the harness would parse the results.

The newer way frontier has setup is structured tool calls. `tool_use` or `tool_calls`. The response is then received as a different tool_result rather than a regular message. That's a bit of the newer way of doing it.

The failure mode in question is more the model mixing the two: "Sure, I'll read the file: {"tool": "read", "args": {"path": "foo"}}" - that'll break stuff. Other failure modes are the json not parsing when sent it as a structured call, and in some cases the model just emitting text and forgetting the tool call.