Comment by coder543

2 months ago

The Llama 4 models were instruct models at a time when everyone was hyped about and expecting reasoning models. As instruct models, I agree they seemed fine, and I think Meta mostly dropped the ball by taking the negative community feedback as a signal that they should just give up. They’ve had plenty of time to train and release a Llama-4.5 by now, which could include reasoning variants and even stronger instruct models, and I think the community would have come around. Instead, it sounds like they’re focusing on closed source models that seem destined for obscurity, where Llama was at least widely known.

On the flip side, it also shows how damaging echo chambers can be, where relatively few people even gave the models a chance, just repeating the negativity they heard from other people and downvoting anyone who voiced a different experience.

I think this was exacerbated by the fact that Llama models had previously come in small, dense sizes like 8B that people could run on modest hardware, where even Llama 4 Scout was a large model that a lot of people in the community weren’t prepared to run. Large models seem more socially accepted now than they were when Llama 4 launched.

2 comments

coder543

zozbot234 2 months ago

Large MoE models are more socially accepted because medium/large sized MoE models can still be quite small wrt. expert size (which is what sets the amount of required VRAM). But a large dense model is still challenging to get to run.

coder543 2 months ago

I meant large MoE models are more socially accepted now. They were not when Llama 4 launched, and I believe that worked against the Llama 4 models.
The Llama 4 models are MoE models, in case you are unaware, since it feels like your comment feels was implying they were dense models.