Comment by lukeschlather
16 hours ago
My impression is there's a definite shortage of GPUs, and if OpenAI is more reliable it's because they have fewer customers relative to the number of GPUs they have. I don't think Google is handing out 429s because they are worried about overspending; I think it's because they literally cannot serve the requests.
This sounds very plausible. OpenAI has hoarded 40% of world's RAM supply, which they likely have no use for other than to starve competition. They (or other competitors) could be utilizing the same strategy for other hardware.
Which is worrying, because if this continues, and if Google, who has GCP is struggling to serve requests, there's no telling what's going to happen with services like Hetzner etc.
> OpenAI has hoarded 40% of world's RAM supply
I believe OpenAI's purchasing is somewhat overstated, it definitely has no effect on Google's current ability to serve Gemini requests, but it is obvious that there's a shortage of most components, and it's also obvious that even internally Google is having to make hard choices about who to let use GPUs when.
I definitely think OpenAI likely has less use for GPUs than Google. Google has $300B in annual revenue vs. $20B for OpenAI. Even if you assume 100% of OpenAI's revenue is going to renting GPUs and they are taking a 50% loss there's still a lot of room for Google to be profitable and spending more money on GPUs, and not have enough GPUs. Google also just has a wider variety of models to train and run, from Waymo to Search to whatever advertising models.
https://docs.hetzner.com/general/infrastructure-and-availabi... price increases
OpenAI is dependent on same hyperscalers (most specifically Microsoft/Azure) as everyone else, and even have access to preferential pricing due to their partnership.
A better explanation is to point out that ChatGPT is still far and away the most popular AI product on the planet right now. When ChatGPT has so many more users, multi-tenant economic effects are stronger, taking advantage of a larger number of GPUs. Think of S3: requests for a million files may load them from literally a million drives due to the sheer size of S3, and even a huge spike from a single customer fades into relatively stable overall load due to the sheer number of tenants. OpenAI likely has similar hardware efficiencies at play and thus can afford to be more generous with spikes from individual customers using OpenCode etc.
I would guess the biggest AI product on the planet is Google's Search AI. Although even that might not be the case, unless your definition of AI is just "LLMs" and not any sort of ML that requires a GPU.