One reason why I'm excited about o1 is that it seems like OpenAI have cracked the nut of effective RL during training time, which takes us out of the domain of just fitting to the curve of "what a human would have said next." I just finished writing a couple blog posts about this; the first [1] covers some problems with that approach and the second [2] talks about what alternatives might look like.
TLDR: “statistically fitting token output is not the same as human intelligence, and human intelligence and AGI are contradictory anyways (because humans make mistakes)”
Saved you the paywall click to the poorly structured medium article :)
One reason why I'm excited about o1 is that it seems like OpenAI have cracked the nut of effective RL during training time, which takes us out of the domain of just fitting to the curve of "what a human would have said next." I just finished writing a couple blog posts about this; the first [1] covers some problems with that approach and the second [2] talks about what alternatives might look like.
[1] https://www.airtrain.ai/blog/how-openai-o1-changes-the-llm-t... [2] https://www.airtrain.ai/blog/how-openai-o1-changes-the-llm-t...
> After reading this paper, I am now
Is this your paper?
Author: @fandzomga Username: fsndz
Why try to funnel us to your paywalled article?
I would like to read it, but it's under a paywall.
https://archive.is/nGaiU
paper is paywalled; just logging into Medium won't do it
sorry for the paywall, you can read the free version here: https://www.lycee.ai/blog/why-no-agi-openai
TLDR: “statistically fitting token output is not the same as human intelligence, and human intelligence and AGI are contradictory anyways (because humans make mistakes)”
Saved you the paywall click to the poorly structured medium article :)