← Back to context

Comment by rpdillon

1 day ago

Yeah, I hate that this figure keeps getting thrown around. IIRC, it's the price of 2048 H800s for 2 months at $2/hour/GPU. If you consider months to be 30 days, that's around $5.7M, which lines up. What doesn't line up is ignoring the costs of facilities, salaries, non-cloud hardware, etc. which will dominate costs, I'd expect. $100M seems like a fairer estimate, TBH. The original paper had more than a dozen authors, and DeepSeek had about 150 researchers working on R1, which supports the notion that personnel costs would likely dominate.

>ignoring the costs of facilities, salaries, non-cloud hardware, etc.

If you lease, those costs are amortized. It was definitely more than $5M, but I don't think it was as high as $100M. All things considered, I still believe Deepseek was trained at one (perhaps two) orders of magnitude lower cost than other competing models.