Comment by doph
11 hours ago
is a kv cache not a kind of state? what does statefulness have to do with selfhood? how does a system prompt work at all if these things have no reference to themselves?
11 hours ago
is a kv cache not a kind of state? what does statefulness have to do with selfhood? how does a system prompt work at all if these things have no reference to themselves?
The kv cache is not persistent. It's a hyper-short-term memory.
Modern kv caches can contain up to 1 million tokens (~3000 pages of text). It's not that short, it's like 48 straight hours of reading.
Yes and no, it's not just text, it's images, video, etc, and it's not just the pages of content, it's also all the "thinking" as well. Plus the models tend to work better earlier on in the context.
I regularly get close to filling up context windows and have to compact the context. I can do this several times in one human session of me working on a problem, which you could argue is roughly my own context window.
My point though was that almost nothing of the model's knowledge is in the context, it's all in the training. We have no functional long term memory for LLMs beyond training.