Comment by zamalek

5 days ago

LLMs are notoriously terrible at multiplying large numbers: https://claude.ai/share/538f7dca-1c4e-4b51-b887-8eaaf7e6c7d3

> Let me calculate that. 729,278,429 × 2,969,842,939 = 2,165,878,555,365,498,631

Real answer is: https://www.wolframalpha.com/input?i=729278429*2969842939

> 2 165 842 392 930 662 831

Your example seems short enough to not pose a problem.

Modern LLMs, just like everyone reading this, will instead reach for a calculator to perform such tasks. I can't do that in my head either, but a python script can so that's what any tool-using LLM will (and should) do.

  • This is special pleading.

    Long multiplication is a trivial form of reasoning that is taught at elementary level. Furthermore, the LLM isn't doing things "in its head" - the headline feature of GPT LLMs is attention across all previous tokens, all of its "thoughts" are on paper. That was Opus with extended reasoning, it had all the opportunity to get it right, but didn't. There are people who can quickly multiply such numbers in their head (I am not one of them).

    LLMs don't reason.

    • I tried this with Claude - it has to be explicitly instructed to not make an external tool call, and it can get the right answer if asked to show its work long-form.

    • Mathematics is not the only kind of reasoning, so your conclusion is false. The human brain also has compartments for different types of activities. Why shouldn't an AI be able to use tools to augment its intelligence?

      3 replies →

    • Furthermore, the LLM isn't doing things "in its head" - the headline feature of GPT LLMs is attention across all previous tokens, all of its "thoughts" are on paper

      LOL, talk about special pleading. Whatever it takes to reshape the argument into one you can win, I guess...

      LLMs don't reason.

      Let's see you do that multiplication in your head. Then, when you fail, we'll conclude you don't reason. Sound fair?

      11 replies →

    • i assert that by your evidentiary standards humans don't reason.

      presumably one of us is wrong.

      therefore, humans don't reason.

  • LLMs don't use tools. Systems that contain LLMs are programmed to use tools under certain circumstances.

    • you’re just abstracting it away into this new “systems” definition

      when someone says LLMs today they obviously mean software that does more than just text, if you want to be extra pedantic you can even say LLMs by themselves can’t even geenrate text since they are just model files if you don’t add them to a “system” that makes use of that model files, doh

      1 reply →

This hasn't been true for a while now.

I asked Gemini 3 Thinking to compute the multiplication "by hand." It showed its work and checked its answer by casting out nines and then by asking Python.

Sonnet 4.6 with Extended Thinking on also computed it correctly with the same prompt.

This doesn’t address the author’s point about novelty at all. You don’t need 100% accuracy to have the capability to solve novel problems.

I thought it might do better if I asked it to do long-form multiplication specifically rather than trying to vomit out an answer without any intermediate tokens. But surprisingly, I found it doesn't do much better.

  • Other comments indicate that asking it to do long multiplication does work, but the varying results makes sense: LLMs are probabilistic, you probably rolled an unlikely result.

    • Specifically, you need to use a reasoning model. Applying more test time compute is analogous to Kahneman's System 2 thinking, while directly taking the first output of an LLM is analogous to System 1.

      This is true for solving difficult novel problems as well, with the addition of tools that an agent can use to research the problem autonomously.