← Back to context

Comment by Chance-Device

9 hours ago

Let’s see, I think these pretty much map out a little chronology of the research:

https://arxiv.org/abs/2112.00114 https://arxiv.org/abs/2406.06467 https://arxiv.org/abs/2404.15758 https://arxiv.org/abs/2512.12777

First that scratchpads matter, then why they matter, then that they don’t even need to be meaningful tokens, then a conceptual framework for the whole thing.

I dont’t see the relevance, the discussion is over whether boilerplate text that occurs intermittently in the output purely for the sake of linguistic correctness/sounding professional is of any benefit. Chain of thought doesn’t look like that to begin with, it’s a contiguous block of text.

  • To boil it down: chain of thought isn’t really chain of thought, it’s just more token generation output to the context. The tokens are participating in computations in subsequent forward passes that are doing things we don’t see or even understand. More LLM generated context matters.

  • That is not how CoT works. It is all in context. All influenced by context. This is a common and significant misunderstanding of autoregressive models and I see it on HN a lot.

  • I don't see the relevance -- and casually dismiss years of researches without even trying to read those paper.