Comment by nl
17 hours ago
Yeah, for me it seems like a if you have to ask you can't run it" type question.
In general the TL;DR is that anything above 35B needs hardware you buy basically only to run large LLMs, and if you have that hardware you don't need to ask the question.
That's simply not true.
~70B models can run fine (albeit somewhat slow) on consumer hardware with 64GB RAM. There are heavily quantized (Q1.x) models that are still usable on similar hardware. Granted recently there haven't been a lot of models of this size, but still, 35B isn't really the practical limit. 35B is mostly the limit if you're using consumer grade GPUs with limited RAM and need the model to run fast.
People have been toying with running large-ish models by partially offloading on CPU+RAM with mixed results, but as long as you're OK with reduced speed, and you quantize the hell out of the big models, you can apparently try a lot more models locally than popular belief.
Yes, this is true, but that's not what I'm saying.
I'm saying that 64GB+ personal computers are vanishingly rare outside builds that were specifically done with AI in mind.
Gamers never saw the need for them, and even in software development 32GB was the standard until AI came along.
Yes, there were specialized use cases where they did exist, and yes, some people just wanted to max out the Macbooks but.. it was rare.
I always max out a machine, which is why i am stuck on ddr4! I can't imagine the cost of maxing out a ddr5 machine released in the last few months.