Comment by magicalhippo
2 hours ago
Smaller models are less forgiving to quantization. For a 12B model I wouldn't expect Q4 to be "pretty close", unless it underwent quantization aware training (QAT). Of course it's not set in stone, there's a huge variance between models, so this might surprise.
No comments yet
Contribute on Hacker News ↗