Comment by andrewstuart
14 days ago
Self hosting LLMs will explode in popularity over next 12 months.
Open models are made much more interesting and exciting and relevant by new generations of AI focused hardware such as the AMD Strix Halo and Apple Mac Studio M3.
GPUs have failed to meet the demands for lower cost and more memory so APUs look like the future for self hosted LLMs.
> new generations of AI focused hardware
Some benchmarks are not encouraging. See e.g. https://www.hardware-corner.net/mac-studio-m3-ultra-deepseek...
That «AI focused hardware» will either have extremely fast memory, and cost prohibitively, or have reasonable costs, and limits that are to be assessed.
Errrr that’s a 671B model.
Yes, but what will you need as you will prepare to be set for your personal needs?
We are far from having reached optimal technology at trivial cost. State-of-the-art commercial VRAM is over 10x faster than the standard one - and costs well over 10x.
Reasonably available speeds may or may not be acceptable.
For single user, maybe. But for small teams GPUs are still the only available option, when considering t/s and concurrency. Nvidia's latest 6000pro series are actually reasonably priced for the amount of vram / wattage you get. A 8x box starts at 75k eur and can host up to DS3 / R1 / Llama4 in 8bit with decent speeds, context and concurrency.
What teams bother to do that, though? It's easier to call an API or spin up a cloud cluster.