Comment by yorwba
2 days ago
> In theory today, I should be able to process <long quote from document> <specific query> and just stop after the long document and save the KV cache right?
People do this, it's called prefix caching.
There's also https://arxiv.org/abs/2506.06266 where they compress the context down to a smaller representation they call a "cartridge," and composing cartridges from different contexts seems to work reasonably well.
No comments yet
Contribute on Hacker News ↗