Comment by simonw
4 hours ago
It's impossible to answer if you don't have a search tool, and three out of the five tested models didn't have a search tool.
4 hours ago
It's impossible to answer if you don't have a search tool, and three out of the five tested models didn't have a search tool.
Thanks; I didn't spot that they disabled tools in the harness. Also they don't provide an "out" to allow the models to express uncertainty so the instructions force a guess to be made.
As an aside though it's still funny that the two tools WITH search also disagreed.
It's impossible to answer unless you have a *100% complete search tool*.
No sytem can know everything. It doesn't matter how many tools you give it. It's always wrong to force binary True / False without shades of "I don't know"