Comment by josh-sematic
1 year ago
One reason why I'm excited about o1 is that it seems like OpenAI have cracked the nut of effective RL during training time, which takes us out of the domain of just fitting to the curve of "what a human would have said next." I just finished writing a couple blog posts about this; the first [1] covers some problems with that approach and the second [2] talks about what alternatives might look like.
[1] https://www.airtrain.ai/blog/how-openai-o1-changes-the-llm-t... [2] https://www.airtrain.ai/blog/how-openai-o1-changes-the-llm-t...
No comments yet
Contribute on Hacker News ↗