I see. Let's just assume that DeepSeek's V3/R1 budget of ~$5.5M was a lie and the Alan Turing Institute is just too poor to compete with their nine-figure sums. I guess I have no further questions.
> the $5-6M cost of training is misleading. It comes from the claim that 2048 H800 cards were used for one training, which at market prices is upwards of $5-6M. Developing such a model, however, requires running this training, or some variation of it, many times, and also many other experiments (item 3 below). That makes the cost to be many times above that, not to mention data collection and other things, a process which can be very expensive (why? item 4 below). Also, 2048 H800 cost between $50-100M. The company that deals with DC is owned by a large Chinese investment fund, where there are many times more GPUs than 2048 H800.
Who actually knows? Far beyond what a UK university can afford.
I see. Let's just assume that DeepSeek's V3/R1 budget of ~$5.5M was a lie and the Alan Turing Institute is just too poor to compete with their nine-figure sums. I guess I have no further questions.
Yeah, the DeepSeek budget wasn't 6M by any means.
> the $5-6M cost of training is misleading. It comes from the claim that 2048 H800 cards were used for one training, which at market prices is upwards of $5-6M. Developing such a model, however, requires running this training, or some variation of it, many times, and also many other experiments (item 3 below). That makes the cost to be many times above that, not to mention data collection and other things, a process which can be very expensive (why? item 4 below). Also, 2048 H800 cost between $50-100M. The company that deals with DC is owned by a large Chinese investment fund, where there are many times more GPUs than 2048 H800.
https://therecursive.com/martin-vechev-of-insait-deepseek-6m...
1 reply →