← Back to context

Comment by 0gs

1 day ago

yeah, on a 96GB Mac Studio and Gemma+Qwen, it's definitely fully doable. fully doable but not really for coding on 16GB. but svelter models and cheaper (eventually) hardware are coming!

"cheaper (eventually) hardware" Best case 2-3 years from now. Otherwise it will take a major global recession to get us anywhere near last year's prices.

Macs are expensive hardware, but I'm always seeing people running LLMs on them. Is anyone running on cheaper generic hardware and Linux?

  • Qwen3.6-35B-A3B-Q4_K_M.gguf spread across few 8-16GB GPUs is cheap as reward points for a comparable Mac if you don't mind heat, noise, and not-blazing-fast generation speeds.

    Most ATX cases only has 7 PCIe I/O shields and can't take more than 3x double slot cards, but many gaming systems can take 2x double slot full length 16GB cards, and they should be fine for many purposes. Cooling is most easily done by a squirrel cage fan mounted with a 3D printed bracket at the back.

    Cheap parallel action crimping tools for Molex 5556 works too - PCIe 8-pin is NOT 5557, it's differently keyed, so the specifically PCIe intended housings have to be used for cables, if you are crimping your own.

    No one is mining crypto anymore, and crypto PSUs are being dumped dirt cheap, should you want a stable bulk 12V supply.

I suspect hosted and local will converge when hardware prices come down and API prices go up. The massive rate of datacenter build out will be unsustainable. Right now the hosted models are massively cheaper than buying the hardware and running it yourself which signals that hosted is very subsidized.

If you don't have that hardware thr math of buying a depreciating computer is challenging if you are satisfied with the $100/month plans ($1200/year). A 96GB Mac Studio is ~$4k. I think if you have the hardware already as a sunk cost then yes it makes sense. But I'm not sure it is worth spending $4k for today's hardware vs waiting for newer hardware in a few years.