Comment by mxwsn

21 hours ago

How do you know that width scaling has been the driving force of improvement?

3 comments

mxwsn

I am no insider and have never even tried to build an LLM, so I can only guess. But the general sentiment seems to be that this is the case. If you are interested, I would recommend you read the MIT paper "Superposition Yields Robust Neural Scaling" [0]. It confirms an interesting trend: models represent more features/concepts than they have clean independent dimensions, so features overlap. Increasing model dimension reduces this geometric interference, which lowers loss in a predictable way, but with diminishing returns.

This has, in my opinion, likely been the primary vector in getting better models thus far, but MIT mathematically proves that it yields diminishing returns for each new dimension added. It will get more and more expensive and the cost-return will or probably already has made it infeasible.

Ilya appear to support sentiment this as well. [1]

[0] - https://openreview.net/forum?id=knPz7gtjPW [1] - https://www.businessinsider.com/openai-cofounder-ilya-sutske...

waterTanuki 17 hours ago

I mean, it's not exactly a PhD level question. One can infer from the extreme demand of GPUs and DRAM + new data center construction that all the providers are banking on width.

svnt 12 hours ago

No? That could just be fomo, actual adoption, or a number of other things.