Comment by Philpax

1 year ago

The fundamental innovation is training the model to reason through reinforcement learning; you can train existing models with traces from these reasoning models to get you within the same ballpark, but taking it further requires you to do RL yourself.

0 comments