← Back to context

Comment by gizajob

7 hours ago

I really don’t get it - why not put a Mac Studio with 128gb of ram on every engineers desk and be like “engineer, engineer your local LLM”. Makes no sense to be spending $20-30,000+ per year on cloud providers when Qwen et al are available. And even less sense to be sending all your company code and data to Anthropic and OpenAI when you can keep all that IP in the building.

The Mac is very feeble compared to the big iron that the providers run so will be much lower performance. Also many companies would prefer engineers work on the domain problems instead of working on novel LLMs.

Because it’s cheaper to pay for the tokens than to pay their engineers to worry about a worse, homebrewed setup.

because local models which can run well using 128gb ram are still not SOTA, yes Qwen is amazing, but nor Qwen 27B neither 35B can outperform Opus 4.6, so why increase rework for your engineers even more, if you can pay slightly more and always use SOTA, until others figure out best practices for running local SOTA's