← Back to context

Comment by epolanski

10 hours ago

If your claim is so solid, you'll have no problem pointing out data or evidence.

DeepSeek R1 was a famous case - not only it briefly beat then-SOTA on the cheap, it was also released with distilled versions that preserved bulk of the improvements but could be run on higher-end consumer hardware.

And of course Gemma models are said to be distillations of Gemini.

  • The distillation you're talking about is about cutting the number of weights, it has nothing to do with extracting QAs from another model.