← Back to context

Comment by Tuna-Fish

3 days ago

The project is an inference framework which should support 100B parameter model at 5-7tok/s on CPU. No one has quantized a 100B parameter model to 1 trit, but this existing is an incentive for someone to do so.

> quantized a 100B parameter model to 1 trit

I had the same question, after some debates with Chatgpt, it's not the "quantize" for post-training we often witness these days, you have to use 1 trit in the beginning since pre-train.