Comment by datadrivenangel

1 day ago

I did the math at least on a Macbook pro, and for inference it's definitely not worth it.

- https://www.williamangel.net/blog/2026/05/17/offline-llm-ene... - Discussion: https://news.ycombinator.com/item?id=48168198

That's the case with Self-hosting anything. It is the privacy that matters.

  • Not necessarily. I was spending ~$150/month on vultr's kubernetes hosting. I spent $5k building out a pretty awesome 1U server and I put it in a colo that costs me $50/month. Next year I will break even financially and everything after that is saving money. I also am getting so much more out of this server than I was getting on vultr because I over-spec'd the machine. In addition to running more on my cluster, I spin up large virtual machines for development, experiments, and for offloading distributed builds. No shade to vultr, but owning my hardware instead of renting was absolutely the way to go. Unfortunately today the ram alone would cost over $5k, so the math has changed.

One value of learning on my Macbook is that mps is not as well supported as cuda which forces me to go down roads I would not have traveled.

  • That's more of a disadvantage. CUDA is an industry standard, MPS/MLX/Metal compute shaders are a novelty.

Except this math is 10x too high (unless accelerated depreciation is all of it) - a million tokens at 28 tokens/sec and 75W and 20c/kwh should cost $0.15 not $1.50. (And less with MTP.)

It's comparing laptops to dedicated GPUs in a server environment. The best comparison would be the Mac Studio but the current release is almost 2 years old at this point. We'll see what a likely M5 Ultra Mac Studio looks like, probably in Q3 this year.

But yes, for pure inference, the M5 Max Macbook Pros probably aren't there yet. They have other utility though of course. And you can get 64GB and 128GB MBPs at a discount. Micro Center currently will let you buy a 64GB M5 Max MBP for under $4k currently, for example.

Why didn't you take into account batching, input tokens, different costs of electricity, and the fact that a laptop can still hold a decent % of its resale value, and is useful for many other tasks than running an LLM?

  • > Why didn't you take into account [...] the fact that a laptop can still hold a decent % of its resale value, and is useful for many other tasks than running an LLM?

    Because that wasn't what they claimed to research?

      >> for inference it's definitely not worth it.
    

    It's entirely fine if you enjoy local LLMs on your computer, there are people doing horribly inefficient inference on smartphones now. But for pure inference tasks, it's pretty obvious why M5s and Mac Studios aren't replacing TPUs and GPUs.

    • Who is going to buy a $4299 M5 Max MBP with 64GB of RAM just to run Gemma 4 31b? Firstly you don't need 64GB for that model. Secondly if you want a machine that sits in the corner and does nothing but LLM inference, you don't buy a MacBook Pro, you buy some GPUs which are going to cost you a fraction of that (~$1k for ~64GB of VRAM is possible). The people buying Apple Silicon for inference general aim for the Mac Studios with enormous amounts of RAM (128-512GB), to run very large models.

      The idea is obviously to be running the LLM on your work laptop. As a developer I'd need a laptop with 24GB of RAM for work anyway, and 48GB, which is enough for a very good quant of Gemini, is just $400 extra.

      6 replies →