Comment by HarHarVeryFunny
13 hours ago
It's different because a chat model has been post-trained for chat, while o1/o3 have been post-trained for reasoning.
Imagine trying to have a conversation with someone who's been told to assume that they should interpret anything said to them as a problem they need to reason about and solve. I doubt you'd give them high marks for conversational skill.
Ideally one model could do it all, but for now the tech is apparently being trained using reinforcement learning to steer the response towards a singular training goal (human feedback gaming, or successful reasoning).
TFA, and my response, are about a de novo relationship between task completion and input prompt. Not conversational skill.