Comment by johndough

21 hours ago

The bulk of Kimi-K2.6's parameters are stored with 4 bits per weight, not 16 or 32. There are a few parameters that are stored with higher precision, but they make up only a fraction of the total parameters.

3 comments

johndough

gpm 21 hours ago

Huh, cool. I guess that makes a lot of sense with all the success the quantization people have been having.

So am I misunderstanding "Tensor type F32 · I32 · BF16" or is it just tagged wrong?

rockinghigh 19 hours ago

The MoE experts are quantized to int4, all other weights like the shared expert weights are excluded from quantization and use bf16.
liuliu 19 hours ago

I32 are 8 4-bit value packed into one int32.