Comment by red2awn

5 hours ago

The "distillation attacks" are mostly using Claude as LLM-as-a-judge. They are not training on the reasoning chains in a SFT fashion.

2 comments

red2awn

zozbot234 5 hours ago

So they're paying expensive input tokens to extract at best a tiny amount of information ("judgment") per request? That's even less like "distillation" than the other claim of them trying to figure out reasoning by asking the model to think step by step.

red2awn 2 hours ago

LLM-as-a-judge is quite effective method to RL a model, similar to RLHF but more objective and scalable. But yes, anthropic is making it more serious than it is. Plus DeepSeek only did it for 125k requests, significantly less than the other labs, but Anthropic still listed them first to create FUD.