Comment by Bjorkbat
7 days ago
I recall it as less an evolution and more a complete tonal shift the moment o3 was evaluated on ARC-AGI. I remember on Twitter Sam made some dumb post suggesting they had beaten the benchmark internally and Francois calling him out on his vagueposting. Soon as they publicly released the scores, it was like he was all-in on reasoning.
Which I have to admit I was kind of disappointed by.
What exactly is "reasoning"?
In this context I believe it refers to models that are trained to generate an internal dialogue that is then fed back in as additional input. This cycle might be performed several times before generating the final output text.
This is in contrast to the way that GPT-2/3/“original 4” work, which is by repeatedly generating the next finalized token based on the full dialogue thus far.
Who invented internal dialogue?