Comment by andrewstuart

6 months ago

Self hosting LLMs will explode in popularity over next 12 months.

Open models are made much more interesting and exciting and relevant by new generations of AI focused hardware such as the AMD Strix Halo and Apple Mac Studio M3.

GPUs have failed to meet the demands for lower cost and more memory so APUs look like the future for self hosted LLMs.

5 comments

andrewstuart

mdp2021 6 months ago

> new generations of AI focused hardware

Some benchmarks are not encouraging. See e.g. https://www.hardware-corner.net/mac-studio-m3-ultra-deepseek...

That «AI focused hardware» will either have extremely fast memory, and cost prohibitively, or have reasonable costs, and limits that are to be assessed.

andrewstuart 6 months ago
Errrr that’s a 671B model.
- mdp2021 6 months ago
  
  Yes, but what will you need as you will prepare to be set for your personal needs?
  We are far from having reached optimal technology at trivial cost. State-of-the-art commercial VRAM is over 10x faster than the standard one - and costs well over 10x.
  Reasonably available speeds may or may not be acceptable.

NitpickLawyer 6 months ago

For single user, maybe. But for small teams GPUs are still the only available option, when considering t/s and concurrency. Nvidia's latest 6000pro series are actually reasonably priced for the amount of vram / wattage you get. A 8x box starts at 75k eur and can host up to DS3 / R1 / Llama4 in 8bit with decent speeds, context and concurrency.

kristianp 6 months ago

What teams bother to do that, though? It's easier to call an API or spin up a cloud cluster.