Comment by NeutralCrane
1 year ago
My guess is two things:
1. Economies of scale. Cloud providers are using clusters in the tens of thousands of GPUs. I think they are able to run inference much more efficiently than you would be able to in a single cluster just built for your needs.
2. As you mentioned, they are selling at a loss. OpenAI is hugely unprofitable, and they reportedly lose money on every query.
The purchase price for a H100 is dramatically lower when you buy a few thousand at a time