Comment by TeMPOraL
8 hours ago
DeepSeek R1 was a famous case - not only it briefly beat then-SOTA on the cheap, it was also released with distilled versions that preserved bulk of the improvements but could be run on higher-end consumer hardware.
And of course Gemma models are said to be distillations of Gemini.
The distillation you're talking about is about cutting the number of weights, it has nothing to do with extracting QAs from another model.