Comment by barrkel

6 months ago

The recurrence comes from replaying tokens during autoregression.

It's as if you have a variable in a deterministic programming language, only you have to replay the entire history of the program's computation and input to get the next state of the machine (program counter + memory + registers).

Producing a token for an LLM is analogous to a tick of the clock for a CPU. It's the crank handle that drives the process.