Comment by ForHackernews
2 months ago
AFAIK DeepSeek have not publicly acknowledged training their model on OpenAI output - the OpenAI people have alleged that they did.
At any rate, I don't think distillation involves 'slurping out' the whole model, as I understand it, it means providing the other model's output as training data input to create your new model. Maybe analogous to an expert teaching a novice how to do something by providing carefully selected examples, without having to expose the novice to all the blind alleys the expert went down to achieve mastery.
No comments yet
Contribute on Hacker News ↗