Comment by gmerc

11 hours ago

It’s just like Airline reward miles and offers no benefit to companies over just renting bare metal GPU time

I hope this horrible time will soon be over when cheaper NPUs come available from more hardware companies, and also when model size get optimized down further.

I wonder what hyperscaled compute farms and models will be good for at that running cost when most AI needs can be fulfilled by on-prem and on-device hardware and models. Probably only customer left are big governments. So in the end the tax payer has to pay for those billions of investments by the AI cartel.

  • The typical NPU is only marginally helpful for on-prem inference. A GPU can read quantized data from main memory and dequantize/pad it locally (making effective use of memory throughput); a NPU often needs to read padded data directly from memory, which is wasteful. So it only helps a little bit wrt. prefill.

    Also, smaller models can obviously be used but a smaller model will be a lot weaker in real-world knowledge and this tends to limit their smarts in a way that can't be compensated by more thinking.