← Back to context

Comment by HarHarVeryFunny

5 days ago

Which is why models like o1 & o3, using heavy RL to boost reasoning performance, may perform worse in other areas where the greater diversity of output is needed.

Of course humans employ different thinking modes too - no harm in thinking like a stone cold programmer when you are programming, as long as you don't do it all the time.

This seems wrong. Reasoning scales all the way up to the discovery of quaternions and general relativity, often requiring divergent thinking. Reasoning has a core aspect of maintaining uncertainty for better exploration and being able to tell when it's time to revisit the drawing board and start over from scratch. Being overconfident to the point of over-constraining possibility space will harm exploration, only working effectively for "reasoning" problems where the answer is already known or nearly fully known. A process which results in limited diversity will not cover the full range of problems to which reasoning can be applied. In other words, your statement is roughly equivalent to saying o3 cannot reason in domains involving innovative or untested approaches.

  • > Reasoning scales all the way up to the discovery of quaternions and general relativity, often

    That would be true only if all that we grant for based/true/fact came through reasoning in a complete logical and awoke state. But it did not, and if you dig a little or more you'd find a lot of actual dreaming revelation, divine and all sorts of subconscious revelation that governs lives and also science.

    • I'd also like to point out serendipitous external input as well. Isaac Newton and watching the apple fall from the tree for instance. Often, thought processes are steered by external stimuli that happen to occur while the thought process is taking place.