← Back to context

Comment by zozbot234

6 months ago

The small models that have been published as part of the DeepSeek release are not a "distilled DeepSeek", they're fine-tuned varieties of Llama and Qwen. DeepSeek may have smaller models internally that are not Llama- or Qwen-based but if so they haven't released them.

Thank you. I’m still learning as I’m sure everyone else is, and that’s a distinction I wasn’t aware of. (I assumed “distilled” meant a compressed parameter size, not necessarily the use of another model in its construction.)