Comment by sailingparrot

4 months ago

Indeed what I meant. The LLM isn’t a blank slate at the beginning of each new token during autoregression as the kv cache is there.

0 comments