Comment by Philpax
5 months ago
The fundamental innovation is training the model to reason through reinforcement learning; you can train existing models with traces from these reasoning models to get you within the same ballpark, but taking it further requires you to do RL yourself.
No comments yet
Contribute on Hacker News ↗