Comment by kamranjon
3 days ago
DeepSeek continues to not only push the boundaries but also publish these incredible papers explaining how they achieved their gains - something the American labs no longer do unfortunately. Chinese labs are doing the most interesting work in AI right now.
>publish these incredible papers explaining how they achieved their gains - something the American labs no longer do unfortunately.
Google is still releasing a lot of llm architecture research. They introduced speculative decoding of LLMs in 2022[1], then released the code to perform sceculative decoding for their Gemma 4 model this year[2]
[1] https://arxiv.org/abs/2211.17192
[2] https://github.com/google-gemma/cookbook/blob/main/docs/mtp/...
Thanks for the clarification - Google does publish more than others - and I actually really appreciate the work they are doing with the Gemma models, which are truly competitive open models. I do wish they’d publish more in depth papers on their Gemma models but appreciate that they are open weights.
They weren't the first to do MTP like this, and arguably did it wrong: the MTP heads are kept in a separate file and have to be welded in by the inference engine.
Qwen 3.6 shipped with working MTP first, and had working MTP in llama.cpp first.
Given the MTP drafter is basically a separate model, keeping it separate makes more sense IMO. It's out of my wheelhouse but it seems like you could adjust the MTP drafter model separately from the main model, too.
Ultimately though the real explanation, I think, is Google doesn't care since for their own purposes (in LiteRT-LM), they do bundle them. As far as I know, anyway.
5 replies →
Nvidia's Nemotron 3 Super also shipped with MTP.
They also shipped Gemma models with their new Matformer architecture which allows for dynamic computation.
https://arxiv.org/pdf/2310.07707v2
[dead]
Probably because American AI companies are on the hook for quite a lot of investment money. I think they are trying to find the magical moat to justify their valuation.
Revealing optimizations similar to these would pretty much reduce their competitive position.
Chinese labs are also still behind, so they’re incentivized to collaborate and have no reason to do it in private.
I suspect their tune will change if they ever take the lead..
The question is also what game they're playing. Deepseek came out of a hedge fund. I think it's no coincidence that their publications tend to have a large impact on AI stock prices.
Destroying the growth story of overvalued stocks is an interesting investment strategy. It's not even new. Shortsellers understandably get terrible rep from execs, but their actions are more often in the public interest than you'd think. Normally it's exposing fraud, but here we get the really fortunate side benefit of what could eventually amount to the most significant contribution to the general software community since Linux.
14 replies →
Which is a good thing. Self-serving motives are more reliable than altruistic ones.
53 replies →
Projection is a funny thing. It causes people to misread situations all the time. Southern slaveowners feared violent retribution from freed slaves, for example [1]. It was pure projection and said more about the South than it did the slaves. The reality was there was no violent retribution. It was the opposite where the former slaveowners continued to inflict violence on the formerly enslaved.
I say this because we see the same thing used as an argument against China. "If they overtake us, they'll do imperialism (like us)." Again, it says more about us than them.
A better reading (IMHO) Of the situation is that China believes that AI shouldn't be used simply to mint a few more trillionaires but the benefits should be shared with society. Why do I say this? Because we now have 70+ years of China doing exactly that. The transformation in China all the way from rural villages to Tier 1 cities has been utterly astounding. China has lifted ~800M people out of extreme poverty.
In some ways we're at a similar point to the late 1990s and 2000s when Microsoft execs complained that Linux, being free, destroyed intellectual property value. Linux should be a perfect example of how people can and do act altruistically, or at least not in a way to bait-and-switch to enrich themselves.
[1]: https://www.reddit.com/r/AskHistory/comments/1d26grm/in_the_...
20 replies →
> Chinese labs are also still behind, so they’re incentivized to collaborate and have no reason to do it in private.
US labs in Google, Meta and SpaceX are not leading, none of them managed to build something on par with GLM 5.2.
Care to explain to me why they still don't collaborate and still choose to do it in private?
13 replies →
They are focused on the things you do when you are not over-capitalized and you can’t get unlimited nvidia hardware to train on. And the results speak for themselves.
Meanwhile we in the US are blocked from buying Huawei GPUs and retirees are boasting about the nvidia in their portfolios.
Also, historically, China has always viewed intellectual property as public property. Similar to open source.
> Chinese labs are also still behind, so they’re incentivized to collaborate and have no reason to do it in private.
Even if they're ahead they don't have enough GPUs to scale. Open sourcing is hence a good strategy to at least get market share (even if not $).
Are they behind in models, or behind in VC money to burn on subsidized compute offered to the public and early customers?
Genuine question.
Not everyone is motivated by greed
3 replies →
True!
So the marketplace is working.
2 replies →
Regardless of where they are, the Chinese will always share their progress, as they're collectivist/cooperative at their core, compared to the individualistic/competitive US.
I don't really see the moat for frontier AI labs being "more efficient models" although that could help their margins - I think moats will be built by expanding the horizontal and vertical market expansion - like Anthropic is doing the most at the moment
I seriously am far from fear mongering and doomsday mentality, but I just can't see how OpenAI and Anthropic can have a successful IPO if the quality gap between the free and paid continues to narrow like that...
[flagged]
5 replies →
Who is financing DeepSeek and what are they expecting in return?
Until recently, DeepSeek were self-financed (it was a spin-out from a hedge fund). They just raised ~50million RMB (US$7bn), and according to media [0] (which admittedly can be unreliable), the lead investors were:
1) The CEO himself 2) Tencent 3) CALT (the battery company) 4) NetEase (internet/media company) 5) JD.com (ecommerce) 6) Chinese investment firms
What are they expecting in return? I'd say the same thing that all those investors in OpenAI and Anthropic are expecting - profit.
[0] https://finance.sina.com.cn/stock/vcpe/2026-06-11/doc-iniazi...
3 replies →
I don't think this question would get to the reason. There could be one or two persons in charge who simply shape the culture of the company, including how much to publish.
They are self financed, the company that makes DeepSeek is a finance company that trades on the markets.
9 replies →
IMHO to promote that China believes in free markets and making the technology available to all.
Which will likely help them bolster the sales of the MANY new AI chips in development/use in China to international markets. Dislodging Nvidia.
Kinda the opposite of what Jensen Huang (Nvidia) thinks US is doing: https://www.youtube.com/shorts/u3SY8nvjhQA
Edit: I'm a fan of deepseek and believe it's good to make the technology open/available. And do think that also help business - which I support as well.
Edit 2: No idea why I'm getting downvoted. That's also their official stance https://english.www.gov.cn/news/202601/08/content_WS695f1b55...
Short AI companies
???
Profit!
Not suggesting this is it, but you know, one possible angle.
Chinese labs are also forced to find performance optimizations since they are aren’t allowed to buy the best chips.
> Probably because American AI companies are on the hook for quite a lot of investment money
That's a lot of words to say it's just capitalist greed.
[dead]
[flagged]
This is incorrect binary thinking. Them releasing open source can be good, but that does not commit you to think that china or chinese companies are saints. There are many shades of grey here and one does not exclude the other (nor include it).
4 replies →
I’m think its in our best interests to lever these american ai companies to exhibit at least some degree of freedom and transparency anyway we can…
Publishing by necessity I wonder? American labs on the cutting edge pioneering the way forward, so Deepseek open sourcing what they’ve got is to help even the playing field.
Hopefully the experts here can offer insight. The above is just my hunch and I’m not a specialist in this field.
Yes, challenger Labs publish out of necessity. It is a marketing strategy. People assuming open source means giving something up, but the reality is that Z.ai has a revenue of some $100M and it would be about $0M if they never open sourced their models.
Wouldn’t that just help the American labs anyway though? Or do they assume they’ve actually already figured this stuff out and kept it secret?
It used to be the case that NSA hired the majority of all math graduates in the US, and were assumed to be years ahead in cryptography. Yet in the 90s, it became clear that they no longer were that - among other things, the cipher of the notorious Clipper chip was broken, and we can rule out that it was made weak on purpose because the whole point of Clipper was that they had a backdoor.
So, despite hiring the cream of the crop of math graduates, who could read the papers of free academia, but whose own result the free world could not access - they fell behind.
I have a theory explaining why. I think it's because science is an interactive process. NSA cryptographers could read papers, but they couldn't talk openly with the authors of those papers, because of secrecy demands - even asking question might indicate what they were working on. You can easily imagine them spending months on something they could have avoided by going to the original authors and getting told "Oh, we tried that for a long time, it doesn't work".
Whether that theory is right or not, cryptography is a concrete example of a domain where public research with fewer resources beat private research with a lot more resources.
11 replies →
From what I gather, the Chinese are behind, but a lot of their research amounts to scrappy, clever discoveries in how to use more novel technologies (for Qwen and Deepseek, its mixture of expert models, that can do inference using a portion of the model at a time). The chinese also distill information from American models, so there’s that.
The American companies, from my impression don’t involve themselves with such lowly “hacks” because they have so much money to just push forward with doing everything on big heavy models that run on the most cutting edge nvidia chips that they can, the moment, kinda sorta get on demand (I say that in some degree of jest).
5 replies →
I'm afraid I'm even balking at the word "pioneering" in context with US frontier labs. They are probably doing a few new things, right, but they are not blazing any trails for others to follow along, the Chinese are.
Or if the US labs are innovating, they're not talking about specifics.
1 reply →
Chinese papers and techniques have been very influential and copied by US labs.
Multi-head Latent Attention (MLA), Multi-Token prediction, MoE architecture are some of the most famous examples.
MoE is from Google (Noam Shazeer)
MTP is from Meta
Another DeepSeek advance that the west are copying is DeepSeek Sparse Attention (DSA)
2 replies →
> Publishing by necessity
It's more a cultural thing. Sharing progress is just in their blood.
This is overly simplistic to the point of glazing. Plenty of Chinese companies maintain industrial secrets to gain an advantage.
1 reply →
Exactly. They did not have to open up their research up and this is what happens when smart researchers are forced to squeeze performance gains out of existing hardware.
They don't have TPUs or access to the latest Vera Rubin GPUs either to get performance gains for free. All of the optimizations Deepseek have done are in software and it goes down to the PTX assembly level.
Compared to Anthropic who are celebrating in fixing a flickering issue in a terminal app which took months to fix.
> All of the optimizations Deepseek have done are in software and it goes down to the PTX assembly level
DeepSeek are still using NVIDIA (PTX) to train on, but for inference have already transitioned to Huawei Ascend chips, and inference speed is what this paper is addressing.
Anthropic almost certainly also has optimized software down to the assembly level, considering this take-home interview challenge they published: https://github.com/anthropics/original_performance_takehome/... which is all about instruction-level performance optimizations. That they don't prioritize UI fixes just means they consider other things more important.
Unlikely: that product is written completely by AI, of which they are not lacking.
More likely is that an AI generated codename is impossible to fix by humans, and SOTA was not able to figure it out until now.
that's pretty silly to use as a measure of what they do internally
5 replies →
> Compared to Anthropic who are celebrating in fixing a flickering issue in a terminal app which took months to fix.
It's funny, because if you ran Claude Code on a slow terminal, the cause of the flicker was obvious: They kept dumping the entire history of the chat back into the terminal in a number of situations, and relied on the terminal to them end up in the correct state.
All frontier labs are working down to the PTX level (and lower)
Deepseek is commoditizing the performance gains US labs rely on to make their investors money.
Chinese companies (and labs) operate in conjunction with the CCP so whatever they're doing, it's because it's Chinese state policy.
What became clear when DeepSeek came onto the scene was that China was seeking to commoditize LLMs. They consider it an issue of national security not to be beholden to US tech companies when it comes to AI. And I, for one, fully endorse this policy.
Another data point on this is the black market for Claude tokens in China [1]. The chat logs themselves are a commodity to train models.
I believe that OpenAI in particular is a bet on a trillion dollar pot of gold that doesn't exist. Google, Microsoft, Amazon and Meta will all be fine. Anthropic is in a far better position than OpenAI (IMHO) but if DeepSeek or some other Chinese open weight model gets as good at coding, they're in real trouble too.
[1]: https://news.ycombinator.com/item?id=48667495
I don’t see how Anthropic is in a better position. They have a slight edge in model quality right at a time when we’re getting a taste of what cheap, “good enough” AI looks like. They don’t own their own compute. And their own arrogance and lies have alienated a huge chunk of their customer base and alerted everyone to the dangers of being dependent on them.
I personally think not owning their own compute is going to be an advantage.
There is a meteor headed towards all this AI investment that I don't think has been properly accounted for and that is, what happens to all the existing hardware investments when NVidia's next architecture comes out. Blackwell (H100/H200) is the current generation. Rubin (R100, presumably R200) is the next and arrives soon. Now a lot of the investment hasn't been spent yet so will likely be spent on Rubin but at that point, what happens when the next iteration comes out and does 3-4x the compute for the same electricity input and same hardware cost?
Also, what happens when people can run way bigger models on consumer hardware in 5 years? The effective limit for useful local LLMs is currently ~31B parameter models because the RTX 5090 has 32GB of VRAM and Apple's shared memory architecture, which can keep bigger models in memory, just doesn't have the raw processing power.
Anyway, why I argue Anthropic is in a better position (than OpenAI) is that they seem to have captured a market that may well be profitable for them as a company, specifically Claude for coding. So they just haven't burnt quite as much cash as OpenAI so aren't in as deep of a hole.
While I think local models are going to improve maassively over the next few years, running them in a data center at scale is always going to be cheaper for a company. Why? Because they can amortize their costs by running 24/7 and powering them and cooling them is simply cheaper at scale when you're talking about 1000+ engineers who otherwise might only be using their hardware ~40 hours a week.
IMHO Google is in the best position here of all the US companies, even though their models aren't the best, because their data centers are ruthlessly efficient, their homegrown TPUs will eventually catch up (and thus avoid the NVidia tax) and they simply haven't bet the farm on winning AI.
4 replies →
> Another data point on this is the black market for Claude tokens in China [1]. The chat logs themselves are a commodity to train models.
anyone with IQ higher than 130 (thus qualified for actual AI R&D) would be questioning something obvious here -
if they are already doing such dodgy stuff with the aim to maximize profits, why would those resellers have large amount of logs with actual American model responses to sell to those AI labs in the first place. shouldn't they just post train & customize some leading Chinese open source models to pretend to be Opus or GPT for the vast majority of their users (as classified by some models) who don't know much about expected Opus behaviours & not skilled enough to tell the differences?
that is actually the interesting bit not covered in your censored version of the story line, it is also what happens on the ground. your censored version of the story implies that those dodgy resellers using stolen credit cards, pooling accounts with stolen IDs and illegally selling very personal logs would somehow be honest enough to spend extra $ to ensure their victims (aka paying users) can actually use real Opus and GPT. LOL
dude, you failed this IQ test miserably.
You don't actually need a very high IQ to do AI R&D. More than it takes to post IQ comments on this site, maybe.
The galaxy brains in the labs putatively buying the logs wouldn't notice this? Or figure out a structure to prevent this?
1 reply →
Their R1 paper was really well-done. But I think it leaves out a few details necessary for stable training.
https://cameronrwolfe.substack.com/p/grpo-tricks
Yep. It's about time western world realized Chinese are not the "very bad guys under dictatorship"
Honestly it's just a hierarchy difference between the two countries. In the US, tech/fin/military companies have the upper hand compared to the government (fragmented between 2 parties). Despite the sharades with Anthropic, Tech-fluencers are in control. Compared to china, the government (dictatorship) has more control over Tech companies (take any example from the past 10 years). For them, undermining the US AI supremacy is an objective, and releasing open weight models is the way, and I'm all for it.
Let's not get crazy here. You can acknowledge that the Chinese AI industry has some structural advantages right now without trying to claim anything else. China is still a brutal autocracy.
I don't think it's very common to believe the Chinese people are bad guys. It's the government and its control of the people that's the problem. And no, I don't think the US is immune to that sort of problem either.
R1 was very influential on US models development.
I'm deep seeking for that open in OpenAI indeed. It’s clear who’s the most anthropocentric in this space.
This is so out of touch. Go to Neurips or the top AI conferences to see what is happening.
If American labs aren't publishing, it doesn't mean they aren't doing even more interesting work.
So fascinating, cant wait to never hear about or be affected by this research until it’s discovered elsewhere.
I genuinely wonder how it feels to be working your whole life, actual flesh and blood and heart and mind pouring 40 to make something that is a dead-end on the tree of human progress because it’s miserly masters are terrified of sharing knowledge.
Days and nights spent playing pretend human pioneer, when you are a lunatic on an island building towers of coconuts.
You could also come up with a cure for cancer, but if nobody knows what you’ve done then there’s not a whole lot we can say about it
Its because our culture worships pieces of paper the government tells us is worth something.
Money is just a physical representation of the ability to get what you want. The problem is not money. It’s the fact that we live in a “me” society.
Nope, people seek it out because government tells them to pay taxes _or else_.
It's almost as if ... they were what OpenAI was when it started. Sad to see but glad someone is doing is.
The difference between greed and power
Doing work ≠ publishing work
They push the boundaries, alright. Of obtaining the results of work without doing the work themselves, which I hate to say it but this is classic Chinese machiavellianist business behavior:
https://www.cnbc.com/2026/06/24/anthropic-alibaba-distillati...
You mean like training off of pirated copyrighted works for example that Anthropic, OpenAI, and Google stole from the internet?
Thank you so much to everyone at DeepSeek who is working on this and who have the courage and generosity to open source this for humanity.
We in the United States will never forget!
For all the harm Trump does to the US at least he is helping China!
The idea that America is going to stay ahead of China is I think at this point clearly delusional. It's also just such silly framing. Why should 350 million people stay ahead of 1 billion people on the other side of the world? If an AI lab in China cures cancer or something do Americans lose?
So many Americans seem to (at least in theory) be ready to sign up for this ongoing confrontation with China. Does anyone think it isn't America who is poking the bear when it comes to the Thucydides trap? Why not try to get along? It occurs to me the only people more Chinese innovation would hurt are the mega cap class in the United States. Elon Musk certainly doesn't want BYD in the United States. Same story all the way down with these super capitalized AI companies. Most average Americans would probably be better off in a world where the United States and China got along. But its those Americans who will be called upon to suffer most of the burden if that trap ever springs.
By this population-only logic, you should concede that India will overtake China.
Why not talk about how China shut out American companies for decades before complaining about BYD?
As an Indian immigrant, the PRC China has engaged in conflict with almost all its neighbors and stated wars in its short history.
China is not so benevolent when they get to the #1 spot:
https://m.economictimes.com/industry/renewables/china-wto-co...
Its not population only logic, but it does underscore that it is silly to expect the United States to inevitably be ahead.
As for the rest of it:
https://youtu.be/74DAI2hr9Kk?t=159
2 replies →
Google and Microsoft publish more than enough and American universities are publishing the science beyond DeepSeek's engineering. That fact that you don't know about them means you're not following the science only reading hacker news.
Google hasn’t published much in depth ML work since T5 (which was hugely influential at the time) - most Gemma releases are 1-3 page model card pdfs these days with no in depth analysis. Even TurboQuant is shaking out to have basically been a rehash of previous work without proper attribution. I do think Microsoft is doing some interesting things with smaller models but haven’t read much research, interested in any refs you might have to share!
Check recent iclr acl icml neurips you will see 10-20 papers from Google Research which are not just simple model cards. they are solid reproducible research.
Sure, in part by "stealing" from American AI companies with Distillation attacks:
https://yipzap.com/anthropic-accuses-alibaba-of-largest-ai-d...
If your moat is “please don’t copy my outputs”, you don’t have a moat. There is no such thing as a distillation “attack”.
How does it differ from pirating music or movies?
7 replies →
How very machiavellianist-libertarian of you.
Don't even try to combine it with any notion of "leadership" then, however, since distillation is literally "copying the actual leader"
2 replies →
US AI companies trained their own models on vast amounts of copyrighted and publicly available content without obtaining permission. There's no moral high ground here.
While I don't agree with your comment being downvoted, I don't think distillation is either an "attack" nor is it "stealing". The idea that someone else gets to decide how I use tokens that I pay for is ludicrous.
Imagine if your casio calculator would come with a ToS that says you can't use it to develop a competitor calculator or any other tools. Or that your hammer can't be used to make other tools. Or, closer to the HN crowd, imagine MS in the 90s saying that you can't use their OS to build competing services to MS. They'd be laughed at and be split immediately if they tried that.
The only thing they can do is to refuse serving tokens (and even that's debatable, if we get to tokens being commoditised). But that's gonna be a game of whack-a-mole, and they know it.
Besides "attack" being a ludicrous name for distillation, note how your article says "accuses", also it's mostly about Alibaba, not DeepSeek (although it's mentioned there). Both Dario Amodei and Sam Altman publicly claimed that DS used their outputs to train their models, and knowing the differences between all these models by heart, I believe they're simply lying through their teeth to sway the public opinion and/or the policy. These models are absolutely nothing alike, and distillation necessarily makes student's outputs similar to teacher's. This is very visible in Z.ai models (which were trained on Gemini outputs to the point that they repeated Google's conditional prompt injections in the CoT, and later on Claude where it started repeating their CoT as well) and certain Google models which were trained on Claude's outputs in a roundabout way. Distillation always shows up in the result.
And certainly they have no idea whether these outputs (assuming they ever existed and it wasn't made up) were used for training. The article mentions that DS made 150k requests. This isn't much and might have been just an eval or a benchmark to compare their own model against. It's really hard to believe DeepSeek had any Claude outputs anywhere in their training schedule, since it's just too different. Besides training on random vibecode of course, which is mostly written by Claude.
You know what, if someone wants to downvote this guy by claiming distillation attacks are not "attacks" or don't cross some ethical bound (especially since I just posted a similar comment), then go right ahead, but if you're combining it with any notion of "leadership", that's like saying that the person in 2nd place in a bike race who is drafting behind the person actually in 1st place is exhibiting "leadership".
There's no "leader" if, absent someone whose results you're copying, you are an emperor without clothes