Comment by rockinghigh
18 hours ago
The MoE experts are quantized to int4, all other weights like the shared expert weights are excluded from quantization and use bf16.
18 hours ago
The MoE experts are quantized to int4, all other weights like the shared expert weights are excluded from quantization and use bf16.
No comments yet
Contribute on Hacker News ↗