Comment by gmerc

11 hours ago

It’s just like Airline reward miles and offers no benefit to companies over just renting bare metal GPU time

2 comments

gmerc

I hope this horrible time will soon be over when cheaper NPUs come available from more hardware companies, and also when model size get optimized down further.

I wonder what hyperscaled compute farms and models will be good for at that running cost when most AI needs can be fulfilled by on-prem and on-device hardware and models. Probably only customer left are big governments. So in the end the tax payer has to pay for those billions of investments by the AI cartel.

zozbot234 11 hours ago

The typical NPU is only marginally helpful for on-prem inference. A GPU can read quantized data from main memory and dequantize/pad it locally (making effective use of memory throughput); a NPU often needs to read padded data directly from memory, which is wasteful. So it only helps a little bit wrt. prefill.
Also, smaller models can obviously be used but a smaller model will be a lot weaker in real-world knowledge and this tends to limit their smarts in a way that can't be compensated by more thinking.