Comment by positron26
6 months ago
And then during the next token, all of that bounded depth is thrown away except for the token of output.
You're fixating on the pseudo-computation within a single token pass. This is very limited compared to actual hidden state retention and the introspection that would enable if we knew how to train it and do online learning already.
The "reasoning" hack would not be a realistic implementation choice if the models had hidden state and could ruminate on it without showing us output.
Sure. But notice "ruminate" is different than introspect, which was what your original comment was about.