Comment by schneems
3 hours ago
Search the topic. It is historically documented. It might no longer be true though.
A way to test might be running an open model locally, directly (without a harness) where you could be sure it's not going through a translation layer. I think these days it might have this tool call behavior built in, but I think back in the day it was treated more like a magic trick. Without it, it behaved similar to "how many r's are in strawberry" for simple math.
It is wildly not true.
The request is for some reasonable math problem a model like GPT or Claude will fail at. I'm not going to set up a local model or some harness for it; I'm just going to copy/paste it into ChatGPT and watch it solve it.
Propose a problem, if you think I'm wrong about this. Seems simple.