Comment by rockinghigh

17 hours ago

The MoE experts are quantized to int4, all other weights like the shared expert weights are excluded from quantization and use bf16.

0 comments