Comment by pama

1 year ago

The S1 paper is not meant to compete with R1. It simply shows that with 1k well curated examples for finetuning (26 minutes training on 16 GPU) and with a simple hack for controlling the length of the thinking process, one can dramatically increase the performance of a non-reasoning model and show a clear increase in benefit with increased test-time compute. It is worth a quick skim.

0 comments