Comment by vanviegen

9 hours ago

Wrong on both counts. The kv-cache is likely to be offloaded to RAM or disk. What you have locally is just the log of messages. The kv-cache is the internal LLM state after having processed these messages, and it is a lot bigger.

1 comment

vanviegen

cortesoft 1 hour ago

I shouldn't have said 'loaded into GPU memory', but my point still stands... the cached data is on the anthropic side, which means that caching more locally isn't going to help with that.