Comment by Aperocky
3 months ago
For all the hype about thinking models, this feels much like compression in terms of information theory instead of a "takeoff" scenario.
There are a finite amount of information stored in any large model, the models are really good at presenting the correct information back, and adding thinking blocks made the models even better at doing that. But there is a cap to that.
Just like how you can compress a file by a lot, there is a theoretical maximum to the amount of compression before it starts becoming lossy. There is also a theoretical maximum of relevant information from a model regardless of how long it is forced to think.
I think an interesting avenue to explore is creating abstractions and analogies. If a model can take a novel situation and create an analogy to one that it is familiar with, it would expand its “reasoning” capabilities beyond its training data.
I think this is probably accurate and what remains to be seen is how "compressible" the larger models are.
The fact that we can compress a GPT-3 sized model into an o1 competitor is only the beginning. Maybe there is even more juice to squeeze there?
But even more, how much performance will we get out of o3 sized models? That is what is exciting since they are already performing near Phd levels on most evals.
my thinking (hope?) is that the reasoning models will be more like how a calculator doesn’t have to “remember” all the possible combinations of addition, multiplication, etc for all the numbers, but can actually compute the results.
As reasoning improves the models could start with a basic set of principles and build from there. Of course for facts grounded in reality RAG would still likely be the best, but maybe with enough “reasoning” a model could simulate an approximation of the universe well enough to get to an answer.