Comment by dist-epoch

5 days ago

using it 24/7 brings the average cost down, not up.

the less you use local LLM, the less sense it makes since you paid a lot for hardware you don't use

7 comments

dist-epoch

That's the point: why would you buy a device that's specifically not optimized to be used for 24/7 inference? It's expensive hardware that's not designed to be used in that situation! The power use for inference isn't especially good and you're not getting even a fraction of the benefit from the hardware that you're paying for.

dist-epoch 5 days ago
> why would you buy a device that's specifically not optimized to be used for 24/7 inference
because it costs $1k-$2k instead of $10k-30k+ for optimized devices
- bastawhiz 5 days ago
  
  Nobody is suggesting you buy a pair of A100s, which is what 15k gets you these days. Get a used 5090. And the author specifically priced the hardware at over 4k, which is double the 1-2k you're noting
apf6 5 days ago

Good question but people are doing it anyway. It's a fact that right now tons of people are buying Mac Minis specifically for this use case, to treat them as their personal data center for agents. The concept of "power use for inference" is foreign. Those people are the ones that motivated this blog post I think.

groundzeros2015 5 days ago

The hardware has multiple uses for the same cost. The pay-per-use server does not.

bastawhiz 5 days ago
The author isn't pricing in the multiple uses. You either compare it apples to apples or you don't. If you're using the machine for general purpose computing on top of inference then the amortized hardware costs are pointless to measure. This is exactly what I said.
- groundzeros2015 4 days ago
  
  Ok you can resell it at the end