Comment by RandyOrion

3 months ago

I guess I have to say thank you Meta?

A somewhat sad rant below.

Deepseek starts a toxic trend of providing super, super large MoE. And MoE is famous for being parameter-inefficient, which is unfriendly to normal consumer hardware with limited vram.

The super large size of LLM also disables nearly every people from doing meaningful development on these models. R1-1776 is the only fine-tune variation of R1 that makes some noise, and it's by a corp not some random individual.

In this release, the smallest Llama 4 model is over 100B, which is not small by any means, and will prevent people from fine-tuning as well.

On top of that, to access llama models on hugging face has become notoriously hard because of 'permission' issues. See details in https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/dis...

Yeah, I personally don't really see the point of releasing large MoEs. I'll stick to small and dense LLMs from Qwen, Mistral, Microsoft, Google and others.

Edit: This comment got downvoted, too. Please explain your reason before doing that.

5 comments

RandyOrion

RandyOrion 3 months ago

More on the accessibility problem, even a request from a Meta engineer was rejected. Is that normal?

See https://huggingface.co/spaces/meta-llama/README/discussions/...

kristianp 3 months ago

Have you heard of the bitter lesson? Bigger means better in Neural Networks.

RandyOrion 3 months ago

Yeah. I know the bitter lesson.
For neutral networks, on one hand, larger size generally indicates higher performance upper limit. On the other hand, you really have to find ways to materialize these advantages over small models, or larger size becomes a burden.
However, I'm talking about local usage of LLMs instead of production usage, which is severely limited by GPUs with low VRAM. You literally cannot run LLMs beyond a specific size.

RandyOrion 3 months ago

People who downvoted this comment, do you guys really have GPUs with 80GB VRAM or M3 ultra with 512GB rams at home?

rfoo 3 months ago

I don't. I have no problem not running open-weight models myself because there's an efficiency gap of two orders of magnitude between "pretend-I-can" solution and running them on hundreds of H100s for high thousands of users.