Comment by PeterStuer
2 hours ago
It does not. It just has a fast way to give you the illusion it "runs continuously" with 25GB of warm memory.
Tbh, I'm not sure paged vram could solve this problem for an (assumed) huge cache miss system such as a major LLM server
No comments yet
Contribute on Hacker News ↗