Comment by khazhoux
2 months ago
What I still don’t understand is how one slurps out an entire model (closed source) though.
Does the deepseek paper actually say what model it’s trained off of, or do they claim the entire thing is from scratch?
2 months ago
What I still don’t understand is how one slurps out an entire model (closed source) though.
Does the deepseek paper actually say what model it’s trained off of, or do they claim the entire thing is from scratch?
AFAIK DeepSeek have not publicly acknowledged training their model on OpenAI output - the OpenAI people have alleged that they did.
At any rate, I don't think distillation involves 'slurping out' the whole model, as I understand it, it means providing the other model's output as training data input to create your new model. Maybe analogous to an expert teaching a novice how to do something by providing carefully selected examples, without having to expose the novice to all the blind alleys the expert went down to achieve mastery.