← Back to context

Comment by torginus

2 months ago

If what you say is true, and distilling LLMs is easy and cheap, and pushing the SOTA without a better model to rely on is dang hard and expensive, then that means the economics of LLM development might not be attractive to investors - spending billions to have your competitors come out with products that are 99% as good, and cost them pennies to train, does not sound like a good business strategy.

What I still don’t understand is how one slurps out an entire model (closed source) though.

Does the deepseek paper actually say what model it’s trained off of, or do they claim the entire thing is from scratch?

  • AFAIK DeepSeek have not publicly acknowledged training their model on OpenAI output - the OpenAI people have alleged that they did.

    At any rate, I don't think distillation involves 'slurping out' the whole model, as I understand it, it means providing the other model's output as training data input to create your new model. Maybe analogous to an expert teaching a novice how to do something by providing carefully selected examples, without having to expose the novice to all the blind alleys the expert went down to achieve mastery.