← Back to context

Comment by adastra22

15 hours ago

Well, it's not tracking. As it predicts each token it is sampling from a probability distribution -- that's what the matrix multiplies are for. It gets a distribution over all tokens and then picks randomly according to that distribution. How flat or how spiky that distribution is tells you how confident it is in its answer.

But it then throws that distribution away / consumes it in the next token calculation. So it's not really tracking it per se.