Comment by syntaxing
3 hours ago
Q8 or Q6_UD with no KV cache quantization. I swear it matters even more with small activated parameters MOE model despite the minimal KL divergence drop
3 hours ago
Q8 or Q6_UD with no KV cache quantization. I swear it matters even more with small activated parameters MOE model despite the minimal KL divergence drop
No comments yet
Contribute on Hacker News ↗