Comment by ForHackernews

7 months ago

AFAIK DeepSeek have not publicly acknowledged training their model on OpenAI output - the OpenAI people have alleged that they did.

At any rate, I don't think distillation involves 'slurping out' the whole model, as I understand it, it means providing the other model's output as training data input to create your new model. Maybe analogous to an expert teaching a novice how to do something by providing carefully selected examples, without having to expose the novice to all the blind alleys the expert went down to achieve mastery.

0 comments

ForHackernews

No comments yet

Contribute on Hacker News ↗