← Back to context

Comment by acchow

9 hours ago

> This is a question that analysts don't even ask

On the contrary, data centers continue to pop up deploying thousands of GPUs specifically because the numbers work out.

The H100 launched at $30k GPU and rented for $2.50/hr. It's been 3 years since launch, the rent price is still around $2.50.

During these 3 years, it has brought in $65k in revenue.

Beyond GPUs themselves, you also have other costs such as data centers, servers and networking, electricity, staff and interest payments.

I think building and operating data center infrastructure is a high risk, low margin business.

They can run these things at 100% utilization for 3 years straight? And not burn them out? That's impressive.

  • Not really. GPUs are stateless so your bounded lifetime regardless of how much you use them is the lifetime of the shitties capacitor on there (essentially). Modulo a design defect or manufacturing defect, I’d expect a usable lifetime of at least 10 years, well beyond the manufacturer’s desire to support the drivers for it (ie the sw should “fail” first).

    • The silicon itself does wear out. Dopant migration or something, I'm not an expert. Three years is probably too low but they do die. GPUs dying during training runs was a major engineering problem that had to be tackled to build LLMs.