← Back to context

Comment by hypoxia

2 days ago

Did you try it with high reasoning effort?

Sorry, not directed at you specifically. But every time I see questions like this I can’t help but rephrase in my head:

“Did you try running it over and over until you got the results you wanted?”

  • This is not a good analogy because reasoning models are not choosing the best from a set of attempts based on knowledge of the correct answer. It really is more like what it sounds like: “did you think about it longer until you ruled out various doubts and became more confident?” Of course nobody knows quite why directing more computation in this way makes them better, and nobody seems to take the reasoning trace too seriously as a record of what is happening. But it is clear that it works!

    • > Of course nobody knows quite why directing more computation in this way makes them better, and nobody seems to take the reasoning trace too seriously as a record of what is happening. But it is clear that it works!

      One thing it's hard to wrap my head around is that we are giving more and more trust to something we don't understand with the assumption (often unchecked) that it just works. Basically your refrain is used to justify all sorts of odd setup of AIs, agents, etc.

      1 reply →

    • Bad news: it doesn't seem to work as well as you might think: https://arxiv.org/pdf/2508.01191

      As one might expect, because the AI isn't actually thinking, it's just spending more tokens on the problem. This sometimes leads to the desired outcome but the phenomenon is very brittle and disappears when the AI is pushed outside the bounds of its training.

      To quote their discussion, "CoT is not a mechanism for genuine logical inference but rather a sophisticated form of structured pattern matching, fundamentally bounded by the data distribution seen during training. When pushed even slightly beyond this distribution, its performance degrades significantly, exposing the superficial nature of the “reasoning” it produces."

      14 replies →

  • What you describe is a person selecting the best results, but if you can get better results one shot with that option enabled, it’s worth testing and reporting results.

    • I get that. But then if that option doesn't help, what I've seen is that the next followup is inevitably "have you tried doing/prompting x instead of y"

      6 replies →