Comment by onlyrealcuzzo
4 hours ago
Aren't most major LLMs moving to an architecture where the model is made up of tons of smaller models?
There's a mountain of reasons why this makes sense from a cost perspective, and seemingly it does also for quality, too, as the newer models train substantially more cheaply and still outperform the older models.
Naively, this seems like it would be relevant.
No comments yet
Contribute on Hacker News ↗