Comment by prodigycorp

5 hours ago

Comments trashing this are rightly correct skeptics who remember the benchmaxxing of llama 4. This model was out in the woods as early as like a couple months ago but they didn't release it because it was at gemini 2.5 pro levels.

10 comments

prodigycorp

zozbot234 4 hours ago

The llama4 series was one of the earliest large MoE's to be made publically available. People just ignored it because they were focused on running smaller and denser models at the time, we should know better these days.

dilap 4 hours ago

Deepseek R1 was a publically-available, MoE model that was getting a ton of attention before llama4. Llama4 didn't get much attention because it wasn't good.
prodigycorp 4 hours ago
the models were objectively horrible
- NitpickLawyer 4 hours ago
  
  They really weren't horrible. They were ~gpt4o, with the added benefit that you could run them on premise. Just "regular" models, non "thinking". Inefficient architecture (number of active out of total) but otherwise "decent" models. They got trashed online by bots and chinese shills (I was online that weekend when it happened, it's something to behold). Just because they were non-thinking when thinking was clearly the future doesn't make them horrible. Not SotA by any means, but still.
  
  6 replies →