Comment by onlyrealcuzzo

4 hours ago

Aren't most major LLMs moving to an architecture where the model is made up of tons of smaller models?

There's a mountain of reasons why this makes sense from a cost perspective, and seemingly it does also for quality, too, as the newer models train substantially more cheaply and still outperform the older models.