Comment by jmalicki
11 hours ago
Have you seen documentation that the thoughts in Claude Code are slipped out separately, authoritative or otherwise? I've heard this claimed a few times and wondering what they're doing differently from traditional thinking models.
What people typically mean by the GP statement is that the “thinking” mode of these models is loosely analogous to what humans do: a bit of a retrograde reconstruction of how we arrived at a gestalt conclusion that sounds good, but may not accurately reflect the real logic at play.
IME you can see this more easily with less-polished models like Deepseek 3.X, where the reasoning in the thinking traces occasionally contradicts or has zero bearing on the non-thinking output.
Of course that can happen!
But they are actual tokens produced, that are then read by the answer generation as part of the prompt, nonetheless. And the hidden state of course has a ton of logic that may not be apparent by the tokens produced as well!
Unlike humans, this thinking cannot possibly be retrograde, since causal masking means it is strictly generated before the answer and cannot be affected by it (though the model may have some concept of an answer by the time it starts generating the thinking tokens, and there is no guarantee the thoughts generated by thinking are actually attended to by the text generation).