← Back to context

Comment by simonw

4 hours ago

It's impossible to answer if you don't have a search tool, and three out of the five tested models didn't have a search tool.

Thanks; I didn't spot that they disabled tools in the harness. Also they don't provide an "out" to allow the models to express uncertainty so the instructions force a guess to be made.

As an aside though it's still funny that the two tools WITH search also disagreed.

It's impossible to answer unless you have a *100% complete search tool*.

No sytem can know everything. It doesn't matter how many tools you give it. It's always wrong to force binary True / False without shades of "I don't know"