Comment by knollimar

3 days ago

Isn't it not completely quantized? I thought there were some dense parts but most is int4?

1 comment

knollimar

Often in MoE models the experts are quantized while the shared portions, being a much smaller part of the network with greater impact, are kept at higher or full precision. Not familiar with the Kimi QAT approach specifically but it's likely they do this.