Comment by schneems
13 hours ago
They are bad at math. But they are good at writing code and as an optimization some providers have it secretly write code to answer the problem, run it and give you the answer without telling you what it did in the middle part.
Someone should tell the mathematicians if they use a calculator or a whiteboard or heavens forbid a computer they are "bad at math".
1) That's not related to chain of thought I was replying to. Someone asked about the "bad at math" and pointed out "but it seems good to me" so I added the color of why that might be the case. Your retort seems to imply I'm making an argument that because something uses tools for a job it cannot be good at the thing it's using a tool for. Which is not the case.
2) If you have something to say, just say it. Don't put words in my mouth and then argue with a thing I didn't say.
What would I do to demonstrate that they are bad at math? If by "maths" we mean things like working out a double integral for a joint probability problem, or anything simpler than that, GPT5 has been flawless.
Search the topic. It is historically documented. It might no longer be true though.
A way to test might be running an open model locally, directly (without a harness) where you could be sure it's not going through a translation layer. I think these days it might have this tool call behavior built in, but I think back in the day it was treated more like a magic trick. Without it, it behaved similar to "how many r's are in strawberry" for simple math.
It is wildly not true.
The request is for some reasonable math problem a model like GPT or Claude will fail at. I'm not going to set up a local model or some harness for it; I'm just going to copy/paste it into ChatGPT and watch it solve it.
Propose a problem, if you think I'm wrong about this. Seems simple.
Are they bad at math? Or are they bad at arithmetic?
if you don't know much math, it's easy to confuse the two
Neither.