Comment by 8note
6 months ago
do LLM models consider future tokens when making next token predictions?
eg. pick 'the' as the next token because there's a strong probability of 'planet' as the token after?
is it only past state that influences the choice of 'the'? or that the model is predicting many tokens in advance and only returning the one in the output?
if it does predict many, id consider that state hidden in the model weights.
I think recent Anthropic work showed that they "plan" future tokens in advance in an emergent way:
https://www.anthropic.com/research/tracing-thoughts-language...
oo thanks!
The most obvious case of this is in terms of `an apple` vs `a pear`. LLMs never get the a-an distinction wrong, because their internal state 'knows' the word that'll come next.
If I give an LLM a fragment of text that starts with, "The fruit they ate was an <TOKEN>", regardless of any plan, the grammatically correct answer is going to force a noun starting with a vowel. How do you disentangle the grammar from planning?
Going to be a lot more "an apple" in the corpus than "an pear"