I see. Let's just assume that DeepSeek's V3/R1 budget of ~$5.5M was a lie and the Alan Turing Institute is just too poor to compete with their nine-figure sums. I guess I have no further questions.
> the $5-6M cost of training is misleading. It comes from the claim that 2048 H800 cards were used for one training, which at market prices is upwards of $5-6M. Developing such a model, however, requires running this training, or some variation of it, many times, and also many other experiments (item 3 below). That makes the cost to be many times above that, not to mention data collection and other things, a process which can be very expensive (why? item 4 below). Also, 2048 H800 cost between $50-100M. The company that deals with DC is owned by a large Chinese investment fund, where there are many times more GPUs than 2048 H800.
Oof, sounds like the budget for V3/R1 was exactly what I said. Having access to compute and running experiments is kind of the bare minimum for a supposed AI lab. And since this is a Western lab, their options for training are far more advantageous. But even if I were to accept all of those ridiculous fudged numbers, that's still within the Alan Turing Institute's budget.
Of course their anti-deep learning "most senior scientist" hasn't heard of DeepSeek, lol.
I see. Let's just assume that DeepSeek's V3/R1 budget of ~$5.5M was a lie and the Alan Turing Institute is just too poor to compete with their nine-figure sums. I guess I have no further questions.
Yeah, the DeepSeek budget wasn't 6M by any means.
> the $5-6M cost of training is misleading. It comes from the claim that 2048 H800 cards were used for one training, which at market prices is upwards of $5-6M. Developing such a model, however, requires running this training, or some variation of it, many times, and also many other experiments (item 3 below). That makes the cost to be many times above that, not to mention data collection and other things, a process which can be very expensive (why? item 4 below). Also, 2048 H800 cost between $50-100M. The company that deals with DC is owned by a large Chinese investment fund, where there are many times more GPUs than 2048 H800.
https://therecursive.com/martin-vechev-of-insait-deepseek-6m...
Oof, sounds like the budget for V3/R1 was exactly what I said. Having access to compute and running experiments is kind of the bare minimum for a supposed AI lab. And since this is a Western lab, their options for training are far more advantageous. But even if I were to accept all of those ridiculous fudged numbers, that's still within the Alan Turing Institute's budget.
Of course their anti-deep learning "most senior scientist" hasn't heard of DeepSeek, lol.