Comment by dr_kiszonka

2 days ago

When would a larger model have a smaller inference footprint? If the larger was MoE and the smaller was dense?

2 comments

dr_kiszonka

yes, MoE reduces the inference compute requirements (inference memory reqs remain the same)

rajveerb 20 hours ago

As someone who has spent quite a lot of time on inference, I would a add a small note:
Deployment looks very different for MoE than dense style models so I would say that it is more nuanced than "inference memory reqs remain the same". Memory can be very different for MoE style models.