Comment by GTP

3 months ago

Sorry for being lazy, but I just don't have the time right now to read the paper. Is there in the paper or somewhere else a comparison based on benchmarks of S1 vs R1 (the full R1, not quantized or distilled)?

The S1 paper is not meant to compete with R1. It simply shows that with 1k well curated examples for finetuning (26 minutes training on 16 GPU) and with a simple hack for controlling the length of the thinking process, one can dramatically increase the performance of a non-reasoning model and show a clear increase in benefit with increased test-time compute. It is worth a quick skim.