Comment by alex7o
7 hours ago
Because when you pay for a subscription they don't silently quantize the model a few week after release, and you can no longer get the full model running.
Otherwise no need for full fp16, int8 works 99% as well for half the mem, and the lower you go the more you start to pay for the quants. But int8 is super safe imo.
No comments yet
Contribute on Hacker News ↗