Comment by Centigonal
2 days ago
> MAI-Thinking-1 is a 35B-active, ~1T-total parameters, sparse Mixture of Experts model, a smaller inference footprint than much larger models.
This seemingly nonsensical sentence (of course this will have a smaller inference footprint than larger models) suggests this model's competitors have larger inference footprints and total parameter sizes.
When would a larger model have a smaller inference footprint? If the larger was MoE and the smaller was dense?
yes, MoE reduces the inference compute requirements (inference memory reqs remain the same)
As someone who has spent quite a lot of time on inference, I would a add a small note:
Deployment looks very different for MoE than dense style models so I would say that it is more nuanced than "inference memory reqs remain the same". Memory can be very different for MoE style models.