← Back to context

Comment by jlokier

3 hours ago

That price at Vultr gets you 1GB of RAM, and 25GB of relatively slow SSD.

The KV cache of your Claude context is:

- Potentially much larger than 25GB. (The KV cache sizes you see people quoting for local models are for smaller models.)

- While it's being used, it's all in RAM.

- Actually it's held in special high-performance GPU RAM, precision-bonded directly to the silicon of ludicrously expensive, state of the art GPUs.

- The KV state memory has to be many thousands of times faster than your 25GB state.

- It's much more expensive per GB than the CPU memory used by a VM. And that in turn is much more expensive than the SSD storage of your 25GB.

- Because Claude is used by far more people (and their agents) than rent VMs, far more people are competing to use that expensive memory at the same time

There is a lot going on to move KV cache state between GPU memory and dedicated, cheaper storage, on demand as different users need different state. But the KV cache data is so large, and used in its entirety when the context is active, that moving it around is expensive too.