Comment by noosphr
5 days ago
Deep seek papers are a must to read for anyone who wants to understand how to make LLMs operate at hyper scale. All western labs hide their best results, or at most release summaries that are about as meaningful as the answers Cleo used to give on stack exchange: https://math.stackexchange.com/questions/562694/integral-int...
I have a suspicion with how quiet all the major players got after the two weeks after deepseek R1 was released that they were reading and implementing everything in the papers that came with it as fast as humanly possible.
None of the major players have ever been quiet. DeepSeek enjoyed about a week or two's worth of press before its spotlight was stolent from the next great model. It never held the top spot, ever, mind you. So I don't understand why you think major players had to say anything about it, when the model was neither first, second or third in real world capability, and why they would have to say anything about it when DeepSeek service processes maybe an 1/8 of what OpenAI, Google or Claude in any given span of time.
I applaud their open efforts. But being "altruistic" and being best are two different things.
DeepSeek's contributions to training efficiency improvements were as, if not more, important than the models themselves. A lot of the worry people had about DeepSeek was related to people questioning the moat of the big AI players, since DeepSeek was able to train a competitive model with so much less compute.
Their innovations in training efficiency were almost guaranteed to have been heavily considered by the big AI labs. For example, Dario Amodei talks about the efficiency improvements being the real important contribution of DeepSeek V3 here: https://www.darioamodei.com/post/on-deepseek-and-export-cont...
> DeepSeek's team did this via some genuine and impressive innovations, mostly focused on engineering efficiency. There were particularly innovative improvements in the management of an aspect called the "Key-Value cache", and in enabling a method called "mixture of experts" to be pushed further than it had before.
Almost all of High Flyers achievements have more to do with scaling the process but when scaling is all you need, it’s darn effective
It crashed the market because retail investors and perhaps non-retail as well had a great deal in overconfidence with the ability of the USA to maintain a lead thanks to the chip gap. High Flyer's innovations allowed them to scale and show that is not the case. This major event then likely spurred on many others. It was a mini 'sputnik moment'
Genuinely many times it seems most people need to find reasons to assume the best about DeepSeek and China in order to confirm their prior bias that “America bad” and “Capital is evil”. The reality is grey and fuzzy, with neither side landing on truth yet
How would people use deepseek to think "Capital is evil?" It was from a private hedge fund named "High Flyer," not a state university project or something.
2 replies →
MLA is just one example of a best-in-class technique from Hangzhou that's seen wide adoption in US prestige labs.
And the saltiness of US labs about DeepSeek is well-known. "O3, explain model distillation like I'm five."
No Sam, explain intellectual property rights to the judge in the NYT test case asshole.
… wait did you just seriously tell SamA that he’s an asshole because of copyright issues… while praising Chinese labs who couldn’t give a rat fuck and won’t follow the same laws? Or pay creators? Physician, heal thyself
2 replies →
I remember on february Deepseek's <think> caused a moderately sized market crash. They didn't just go silent, almost every vendor implemented their own version of thinking models while blaming Deepseek for stealing their tech/training on their models. It was rather pathetic to watch.
OAI and others were already on their way there or released the models. How did you manage to convince yourself that High Flyer did it first ? And that everyone else copied from them post-hoc? You’ve created a new chain of causality that simply does not match neutral reality
Yeah I confess I rewrote history and crashed the stock market. Then ran out of juice just as I was about to kill Hitler.
2 replies →