Comment by og_kalu
2 days ago
>Humans can break things down and work through them step by step. The LLMs one-shot pattern match.
I've had LLMs break down problems and work through them, pivot when errors arise and all that jazz. They're not perfect at it and they're worse than humans but it happens.
>Anthropic even showed that the reasoning models tended to work backwards: one shotting an answer and then matching a chain of thought to it after the fact.
This is also another failure mode that occurs in humans. A number of experiments suggest human explanations are often post hoc rationalizations even when they genuinely believe otherwise.
>If a human is capable of multiplying double digit numbers, they can also multiple those large ones.
Yeah, and some of them will make mistakes, and some of them will be less accurate than GPT-5. We didn't switch to calculators and spreadsheets just for the fun of it.
>GPT’s answer was orders of magnitude off. It resembles the right answer superficially but it’s a very different result.
GPT-5 on the site is a router that will give you who knows what model so I tried your query with the API directly (GPT-5 medium thinking) and it gave me:
9.207337461477596e+27
When prompted to give all the numbers, it returned:
9,207,337,461,477,596,127,977,612,004.
You can replicate this if you use the API. Honestly I'm surprised. I didn't realize State of the Art had become this precise.
Now what ? Does this prove you wrong ?
This is kind of the problem. There's no sense in making gross generalizations, especially off behavior that also manifests in humans.
LLMs don't understand some things well. Why not leave it at that?
Here is how GPT self-described LLM reasoning when I asked about it:
That at least sounds consistent with what I’ve been trying to say and what I’ve observed.