← Back to context

Comment by zozbot234

5 days ago

Large MoE models are more socially accepted because medium/large sized MoE models can still be quite small wrt. expert size (which is what sets the amount of required VRAM). But a large dense model is still challenging to get to run.

I meant large MoE models are more socially accepted now. They were not when Llama 4 launched, and I believe that worked against the Llama 4 models.

The Llama 4 models are MoE models, in case you are unaware, since it feels like your comment feels was implying they were dense models.