Comment by c7b
17 hours ago
1,5k. For two months of that spend you could buy a machine that can self-host decent models, plus a year's worth of electricity. It's not up there in terms of quality, but with a bit more effort it works pretty decently. I'm completely baffled that that's not way more common, is it really just the quality?
Second here. From recent Alibaba Qwen conference: the all-in-one box (DC in a box - I think I was called Apsara, 0.6x0.6x1.5m) plug and play, 1.5TB GPU RAM, capability to run in a fully air gapped environment, any open models... All of that is roughly $300k one time. And this box can do non LLM tasks as well. Performance (throughput) around 20k t/s. Delivery time - around 2 months. For any medium sized company its perhaps cheaper to just buy it once than spending 1.5k for cloud per user
Where can I find more information on this? A web search didn’t reveal much for me.
Decent vs best-money-can-buy. Further, a self-hosted LLM will be much slower.
I think we're all past the "bet-money-can-buy" stage. The most expensive models are an order of magnitude more expensive than the middle ground ones, so you need to be selective about what you run where.
And with a bit of careful routing - there isn't a lot stopping you sending the hard stuff to a cloud model and the average stuff to an on prem model.
Only people who do pay-per-use optimize this. Most heavy users have their use covered by an employer.
1 reply →
I'd think for most companies the pace of change is too high at the moment. Give it a few years, a bit of a plateau in the improvements in frontier models and I can't see how many of these companies don't implode under the weight of competition on inference prices.