Comment by ryao

2 days ago

Wasn’t that figure just the cost of the GPUs and nothing else?

Yeah, I hate that this figure keeps getting thrown around. IIRC, it's the price of 2048 H800s for 2 months at $2/hour/GPU. If you consider months to be 30 days, that's around $5.7M, which lines up. What doesn't line up is ignoring the costs of facilities, salaries, non-cloud hardware, etc. which will dominate costs, I'd expect. $100M seems like a fairer estimate, TBH. The original paper had more than a dozen authors, and DeepSeek had about 150 researchers working on R1, which supports the notion that personnel costs would likely dominate.

  • >ignoring the costs of facilities, salaries, non-cloud hardware, etc.

    If you lease, those costs are amortized. It was definitely more than $5M, but I don't think it was as high as $100M. All things considered, I still believe Deepseek was trained at one (perhaps two) orders of magnitude lower cost than other competing models.

That is also just the final production run. How many experimental runs were performed before starting the final batch? It could be some ratio like 10 hours of research to every one hour of final training.