Comment by HarHarVeryFunny

10 months ago

Which is why models like o1 & o3, using heavy RL to boost reasoning performance, may perform worse in other areas where the greater diversity of output is needed.

Of course humans employ different thinking modes too - no harm in thinking like a stone cold programmer when you are programming, as long as you don't do it all the time.

4 comments

HarHarVeryFunny

Vetch 10 months ago

This seems wrong. Reasoning scales all the way up to the discovery of quaternions and general relativity, often requiring divergent thinking. Reasoning has a core aspect of maintaining uncertainty for better exploration and being able to tell when it's time to revisit the drawing board and start over from scratch. Being overconfident to the point of over-constraining possibility space will harm exploration, only working effectively for "reasoning" problems where the answer is already known or nearly fully known. A process which results in limited diversity will not cover the full range of problems to which reasoning can be applied. In other words, your statement is roughly equivalent to saying o3 cannot reason in domains involving innovative or untested approaches.

larodi 10 months ago
> Reasoning scales all the way up to the discovery of quaternions and general relativity, often
That would be true only if all that we grant for based/true/fact came through reasoning in a complete logical and awoke state. But it did not, and if you dig a little or more you'd find a lot of actual dreaming revelation, divine and all sorts of subconscious revelation that governs lives and also science.
- BobbyJo 10 months ago
  
  I'd also like to point out serendipitous external input as well. Isaac Newton and watching the apple fall from the tree for instance. Often, thought processes are steered by external stimuli that happen to occur while the thought process is taking place.
  
  1 reply →