Comment by yorwba

2 days ago

> In theory today, I should be able to process <long quote from document> <specific query> and just stop after the long document and save the KV cache right?

People do this, it's called prefix caching.

There's also https://arxiv.org/abs/2506.06266 where they compress the context down to a smaller representation they call a "cartridge," and composing cartridges from different contexts seems to work reasonably well.