Comment by make3

3 days ago

It's weird to me to train such huge models to then destroy them by using them a 3 bits quantization per presumably 16bits (bfloat16) weights. Why not just train smaller models then.

0 comments