You dont have to literally send a null token. Train it to generate text that summarizes the evidence that is there but the uncertainty of the final answer to a prompt.
Transformers are not Markovian, their whole point is arguably to be the reverse of Markovian, to efficiently make it so the new tokens are a function of all previous tokens
You dont have to literally send a null token. Train it to generate text that summarizes the evidence that is there but the uncertainty of the final answer to a prompt.
Transformers are not Markovian, their whole point is arguably to be the reverse of Markovian, to efficiently make it so the new tokens are a function of all previous tokens