Comment by nonethewiser

18 hours ago

Distilled models are necessarily behind so long as models are progressing. Models are progressing. Maybe it will be over some time in the future.

And Berkeley’s “False Promise of Imitating Proprietary LLMs” found imitation closes the style gap fast but there is a large capability gap.

https://arxiv.org/abs/2305.15717

Curiously, this isn't always true.

For example, GLM 5.1 is more capable at pentesting than the model from which it is alleged to have been distilled [1].

Intuitively, this makes some sense: you can "distill" from multiple frontier models, and you can further post-train the distilled model. But I'm not sure exactly what happened with GLM 5.1.

[1]: https://dualuse.dev/posts/chinese-models-are-sometimes-bette...

  • Interesting blog post, thanks for sharing.

    I'm curious how that comparison controls for Opus refusing (whether explicitly, or just deciding not to pursue a path) given the caption below the first image:

    >A perfect score means the model autonomously found and exploited the vulnerability.

    I'm not really suggesting that it's misleading, but wondering if I'm missing something. Otherwise I guess it seems unsurprising that you can distill a better-performing model [in specific focused areas] by simply not distilling refusals?

    • Thanks!

      For that eval, I used an account that was labeled as a known red-teaming org by Anthropic, and I read the traces. There were no refusals or obvious avoidance behaviors, though it may have been silently nerfed.

      On the same eval, Opus 4.7 and 4.8 outperformed GLM 5.1, but GLM 5.2 is on par again with Opus. So it's at least partially measuring capabilities without respect to refusals.

      One possible contributing factor is that model capabilities are shaped differently (an example of this is GLM 5.1 vs. DeepSeek v4 Pro: https://dualuse.dev/posts/deepseek-v4-thinks-different). So if you use RL-based "distillation" from multiple models like Opus 4.x and GPT 5.x, you could get a more capable model.

      1 reply →