Comment by throwdbaaway
6 months ago
That makes sense, thanks for the info. Here's a quick recap of the recent MoE models based on the criteria..
correct activated params:
* DeepSeek V3/R1 series
* Kimi K2
* GPT-OSS series
undercount activated params:
* GLM-4.5 series
overcount activated params:
* DeepSeek V2 series
* Qwen3 series
* Ernie 4.5 series
* Hunyuan A13B
No comments yet
Contribute on Hacker News ↗