Comment by moffkalast

1 day ago

In retrospect it's actually funny that last year Meta spent so many resources training a dense 405B model that both underperforms compared to models a tenth its size and is impossible to run at a reasonable speed on any hardware in existence.

7 comments

moffkalast

jychang 1 day ago

Strong disagree.

Llama 4's release in 2025 is (deservedly) panned, but Llama 3.1 405b does not deserve that slander.

https://artificialanalysis.ai/#frontier-language-model-intel...

Do not compare 2024 models to the current cutting edge. At the time, Llama 3.1 405b was the very first open source (open weights) model to come close to the closed source cutting edge. It was very very close in performance to GPT-4o and Claude 3.5 Sonnet.

In essence, it was Deepseek R1 before Deepseek R1.

seunosewa 1 day ago
He is definitely talking about Llama4.
- lcnPylGDnU4H9OF 1 day ago
  
  > last year
  > dense
  > 405B model
  Llama4 does not match any of these details. Maybe the commenter thinks their comment is about Llama4 (I don't see a reason to believe so) but readers familiar with these details know they are referring to Llama3.1.
  
  1 reply →
- jug 1 day ago
  
  Llama 4 is neither from last year nor a dense model.

NitpickLawyer 1 day ago

It's not that clear. Yes, it underperforms in recent benchmarks and usecases (i.e. agentic stuff), but it is still one of the strongest open models in terms of "knowledge". Dense does have that advantage of MoE, even if it's extremely expensive to run inference on.

Check out this great exercise - https://open.substack.com/pub/outsidetext/p/how-does-a-blind...

moffkalast 1 day ago

Ok wow that is incredibly interesting, what a test. I would've honestly expected just random noise (like if you gave this same task a human, lol) but you can even see related models draw similar results. Maybe it is an indicator of overall knowledge, or how consistent the world model is. It also could not correlate at all with non-geographical knowledge.