Comment by lebovic

18 hours ago

Thanks!

For that eval, I used an account that was labeled as a known red-teaming org by Anthropic, and I read the traces. There were no refusals or obvious avoidance behaviors, though it may have been silently nerfed.

On the same eval, Opus 4.7 and 4.8 outperformed GLM 5.1, but GLM 5.2 is on par again with Opus. So it's at least partially measuring capabilities without respect to refusals.

One possible contributing factor is that model capabilities are shaped differently (an example of this is GLM 5.1 vs. DeepSeek v4 Pro: https://dualuse.dev/posts/deepseek-v4-thinks-different). So if you use RL-based "distillation" from multiple models like Opus 4.x and GPT 5.x, you could get a more capable model.