Comment by dr_kiszonka
2 days ago
When would a larger model have a smaller inference footprint? If the larger was MoE and the smaller was dense?
2 days ago
When would a larger model have a smaller inference footprint? If the larger was MoE and the smaller was dense?
yes, MoE reduces the inference compute requirements (inference memory reqs remain the same)
As someone who has spent quite a lot of time on inference, I would a add a small note:
Deployment looks very different for MoE than dense style models so I would say that it is more nuanced than "inference memory reqs remain the same". Memory can be very different for MoE style models.