← Back to context

Comment by cyanydeez

12 hours ago

the problem is the null answer will stop the "markov" chain.

so, thats all.

You dont have to literally send a null token. Train it to generate text that summarizes the evidence that is there but the uncertainty of the final answer to a prompt.

Transformers are not Markovian, their whole point is arguably to be the reverse of Markovian, to efficiently make it so the new tokens are a function of all previous tokens