Comment by light_hue_1

1 year ago

S1 has no relationship to R1. It's a marketing campaign for an objectively terrible and unrelated paper.

S1 is fully supervised by distilling Gemini. R1 works by reinforcement learning with a much weaker judge LLM.

They don't follow the same scaling laws. They don't give you the same results. They don't have the same robustness. You can use R1 for your own problems. You can't use S1 unless Gemini works already.

We know that distillation works and is very cheap. This has been true for a decade; there's nothing here.

S1 is a rushed hack job (they didn't even run most of their evaluations with an excuse that the Gemini API is too hard to use!) that probably existed before R1 was released and then pivoted into this mess.

0 comments

light_hue_1

No comments yet

Contribute on Hacker News ↗