Comment by gpm
21 hours ago
Huh, so the metadata says 1.1 trillion parameters, each 32 or 16 bits.
But the files are only roughly 640GB in size (~10GB * 64 files, slightly less in fact). Shouldn't they be closer to 2.2TB?
21 hours ago
Huh, so the metadata says 1.1 trillion parameters, each 32 or 16 bits.
But the files are only roughly 640GB in size (~10GB * 64 files, slightly less in fact). Shouldn't they be closer to 2.2TB?
The bulk of Kimi-K2.6's parameters are stored with 4 bits per weight, not 16 or 32. There are a few parameters that are stored with higher precision, but they make up only a fraction of the total parameters.
Huh, cool. I guess that makes a lot of sense with all the success the quantization people have been having.
So am I misunderstanding "Tensor type F32 · I32 · BF16" or is it just tagged wrong?
The MoE experts are quantized to int4, all other weights like the shared expert weights are excluded from quantization and use bf16.
I32 are 8 4-bit value packed into one int32.
The description specifically says:
"Kimi-K2.6 adopts the same native int4 quantization method as Kimi-K2-Thinking."