Comment by gpm

20 hours ago

Huh, so the metadata says 1.1 trillion parameters, each 32 or 16 bits.

But the files are only roughly 640GB in size (~10GB * 64 files, slightly less in fact). Shouldn't they be closer to 2.2TB?

5 comments

gpm

johndough 20 hours ago

The bulk of Kimi-K2.6's parameters are stored with 4 bits per weight, not 16 or 32. There are a few parameters that are stored with higher precision, but they make up only a fraction of the total parameters.

gpm 20 hours ago
Huh, cool. I guess that makes a lot of sense with all the success the quantization people have been having.
So am I misunderstanding "Tensor type F32 · I32 · BF16" or is it just tagged wrong?
- rockinghigh 18 hours ago
  
  The MoE experts are quantized to int4, all other weights like the shared expert weights are excluded from quantization and use bf16.
- liuliu 17 hours ago
  
  I32 are 8 4-bit value packed into one int32.

coder543 19 hours ago

The description specifically says:

"Kimi-K2.6 adopts the same native int4 quantization method as Kimi-K2-Thinking."