Comment by jasonjmcghee
5 hours ago
> if you want to understand the effects of quantization on model quality, it's really easy to spin up a GPU server instance and play around
Fwiw, not necessarily. I've noticed quantized models have strange and surprising failure modes where everything seems to be working well and then does a death spiral repeating a specific word or completely failing on one task of a handful of similar tasks.
8-bit vs 4-bit can be almost imperceptible or night and day.
This isn't something you'd necessarily see playing around, but when trying to do something specific
No comments yet
Contribute on Hacker News ↗