Comment by dTal
11 hours ago
"Thinking mode" only provides the illusion of debuggability. It improves performance by generating more tokens which hopefully steer the context towards one more likely to produce the desired response, but the tokens it generates do not reflect any sort of internal state or "reasoning chain" as we understand it in human cognition. They are still just stochastic spew. You have no more insight into why the model generates the particular "reasoning steps" it does than you do into any other output, and neither do you have insight into why the reasoning steps lead to whatever conclusion it comes to. The model is much less constrained by the "reasoning" than we would intuit for a human - it's entirely capable of generating an elaborate and plausible reasoning chain which it then completely ignores in favor of some invisible built-in bias.
I'm always amused when I see comments saying, "I asked it why it produced that answer, and it said...." Sorry, you've badly misunderstood how these things work. It's not analyzing how it got to that answer. It's producing what it "thinks" the response to that question should look like.