Comment by WhitneyLand
15 hours ago
For Kimi quantization is part of the training also. Specifically they say they use QAT, quantization aware training.
That doesn't mean training with all integer math, but certain tricks are used to specifically plan for the end weight size. I.e. fake quantization nodes are inserted to simulate int4.
No comments yet
Contribute on Hacker News ↗