Comment by fsndz

1 year ago

[flagged]

One reason why I'm excited about o1 is that it seems like OpenAI have cracked the nut of effective RL during training time, which takes us out of the domain of just fitting to the curve of "what a human would have said next." I just finished writing a couple blog posts about this; the first [1] covers some problems with that approach and the second [2] talks about what alternatives might look like.

[1] https://www.airtrain.ai/blog/how-openai-o1-changes-the-llm-t... [2] https://www.airtrain.ai/blog/how-openai-o1-changes-the-llm-t...

Author: @fandzomga Username: fsndz

Why try to funnel us to your paywalled article?

TLDR: “statistically fitting token output is not the same as human intelligence, and human intelligence and AGI are contradictory anyways (because humans make mistakes)”

Saved you the paywall click to the poorly structured medium article :)