Comment by dist-epoch
5 days ago
using it 24/7 brings the average cost down, not up.
the less you use local LLM, the less sense it makes since you paid a lot for hardware you don't use
5 days ago
using it 24/7 brings the average cost down, not up.
the less you use local LLM, the less sense it makes since you paid a lot for hardware you don't use
That's the point: why would you buy a device that's specifically not optimized to be used for 24/7 inference? It's expensive hardware that's not designed to be used in that situation! The power use for inference isn't especially good and you're not getting even a fraction of the benefit from the hardware that you're paying for.
> why would you buy a device that's specifically not optimized to be used for 24/7 inference
because it costs $1k-$2k instead of $10k-30k+ for optimized devices
Nobody is suggesting you buy a pair of A100s, which is what 15k gets you these days. Get a used 5090. And the author specifically priced the hardware at over 4k, which is double the 1-2k you're noting
Good question but people are doing it anyway. It's a fact that right now tons of people are buying Mac Minis specifically for this use case, to treat them as their personal data center for agents. The concept of "power use for inference" is foreign. Those people are the ones that motivated this blog post I think.
The hardware has multiple uses for the same cost. The pay-per-use server does not.
The author isn't pricing in the multiple uses. You either compare it apples to apples or you don't. If you're using the machine for general purpose computing on top of inference then the amortized hardware costs are pointless to measure. This is exactly what I said.
Ok you can resell it at the end