← Back to context

Comment by kqr

2 days ago

On the other hand, since they "think in writing" they also do not keep any reasoning secret from us. Whatever they actually did is based on past transcript plus training.

That writing isn't the only "thinking" though. Some thinking can happen in the course of generating a single token, as shown by the ability to answer a question without any intermediate reasoning tokens. But as we've all learnt this is a less powerful and more error-prone mode of thinking.

So that is to say I think a small amount of secret reasoning would be possible, e.g. if the location is known or guessed from the beginning by another means and the reasoning steps are made up to justify the conclusion.

The more clearly sound the reasoning steps are, the less plausible that scenario is.

Right but the reasoning/thinking is _also_ explained as being partially or completely performative. This is made obvious when mistakes that show up in chain of thought _don't_ result in mistakes in the final answer.l (a fairly common phenomenon). It is also explained more simply by the training objective (next token prediction) and loss function encouraging plausible looking answers.