Comment by theptip
3 days ago
I think it’s very plausible that the OSS models are being distilled too, but note that it’s asymmetrical.
You can’t get an Opus 4.5 by distilling from DeepSeek. What you might be able to get is a slightly more cost-effective training data generation pipeline, or something along those lines.
In the other direction, my belief is that DeepSeek could not have been trained without distilling from US labs. They simply didn’t have the compute to do the pre-training required.
No comments yet
Contribute on Hacker News ↗