Comment by dragonwriter
2 days ago
> Other companies were allegedly distilling the models by training on the reasoning output
In the case of makers of open-source models (which are also competition), there is no allegedly, they were (and still are) openly doing that.
In the case of the closed models too... Claude would happily tell you it was deepseek-v3 if you asked in chinese until it caught public attention and they papered over it.
The word “openly” in my post there for a reason; the commercial models are not openly distilled from competitors: many open source models have in their model documentation that distillation was done from a dataset drawn from specific other models, including commercial models.
That distillation might be inferred from the behavior of commercial models is not the same as them openly doing it.
Fair enough!