Comment by zargon
5 days ago
Total parameters, not active parameters, is the property that matters for model robustness under extreme quantization.
Once you're swapping from disk, the performance will be quite unusable for most people. And for local inference, KV cache is the worst possible choice to put on disk.
No comments yet
Contribute on Hacker News ↗