For neutral networks, on one hand, larger size generally indicates higher performance upper limit. On the other hand, you really have to find ways to materialize these advantages over small models, or larger size becomes a burden.
However, I'm talking about local usage of LLMs instead of production usage, which is severely limited by GPUs with low VRAM. You literally cannot run LLMs beyond a specific size.
Yeah. I know the bitter lesson.
For neutral networks, on one hand, larger size generally indicates higher performance upper limit. On the other hand, you really have to find ways to materialize these advantages over small models, or larger size becomes a burden.
However, I'm talking about local usage of LLMs instead of production usage, which is severely limited by GPUs with low VRAM. You literally cannot run LLMs beyond a specific size.