Comment by cmrdporcupine
3 days ago
BTW the distillation (or accusations of it) seems to go both ways. I've seen multiple reports of people asking Claude what model it is -- in Chinese -- and having it answer that it's DeepSeek.
They're all scavengers, and we're the road kill.
I think it’s very plausible that the OSS models are being distilled too, but note that it’s asymmetrical.
You can’t get an Opus 4.5 by distilling from DeepSeek. What you might be able to get is a slightly more cost-effective training data generation pipeline, or something along those lines.
In the other direction, my belief is that DeepSeek could not have been trained without distilling from US labs. They simply didn’t have the compute to do the pre-training required.