Comment by theshrike79
4 hours ago
Think of it like this: Anthropic has to keep a full virtual machine running just for you. How long should it idle there taking resources when you only pay a static monthly fee and not hourly?
They have a limited number of resources and can’t keep everyone’s VM running forever.
I pay $5/mo to Vultr for a VM that runs continuously and maintains 25GB of state.
It does not. It just has a fast way to give you the illusion it "runs continuously" with 25GB of warm memory.
Tbh, I'm not sure paged vram could solve this problem for an (assumed) huge cache miss system such as a major LLM server