Comment by throwawayk7h

5 days ago

I thought it might do better if I asked it to do long-form multiplication specifically rather than trying to vomit out an answer without any intermediate tokens. But surprisingly, I found it doesn't do much better.

Other comments indicate that asking it to do long multiplication does work, but the varying results makes sense: LLMs are probabilistic, you probably rolled an unlikely result.

  • Specifically, you need to use a reasoning model. Applying more test time compute is analogous to Kahneman's System 2 thinking, while directly taking the first output of an LLM is analogous to System 1.

    This is true for solving difficult novel problems as well, with the addition of tools that an agent can use to research the problem autonomously.