Comment by red75prime
6 months ago
Conversely, if they didn't cheat and they funded creation of the test suite to get "clean" problems (while hiding their participation to prevent getting problems that are somehow tailored to be hard for LLMs specifically), then they have no reasons to fear that all this looks fishy as the test results will soon be vindicated when they'll give wider access to the model.
I refrain from forming a strong opinion in such situations. My intuition tells me that it's not cheating. But, well, it's intuition (probably based on my belief that the brain is nothing special physics-wise and it doesn't manage to realize unknown quantum algorithms in its warm and messy environment, so that classical computers can reproduce all of its feats when having appropriate algorithms and enough computing power. And math reasoning is just another step on a ladder of capabilities, not something that requires completely different approach). So, we'll see.
> based on my belief that the brain is nothing special physics-wise and it doesn't manage to realize unknown quantum algorithms in its warm and messy environment
Agreed (well as much as intuition goes), but current gen AI is not a brain, much less a human brain. It shows similarities, in particular emerging multi-modal pattern matching capabilities. There is nothing that says that’s all the neocortex does, in fact the opposite is a known truth in neuroscience. We just don’t know all functions yet - we can’t just ignore the massive Chesterton’s fence we don’t understand.
This isn’t even necessarily because the brain is more sophisticated than anything else, we don’t have models for the weather and immune system or anything chaotic really. Look, folding proteins is still a research problem and that’s at the level of known molecular structure. We greatly overestimate our abilities to model & simulate things. Todays AI is a prime example of our wishful thinking and glossing over ”details”.
> so that classical computers can reproduce all of its feats when having appropriate algorithms and enough computing power.
Sure. That’s a reasonable hypothesis.
> And math reasoning is just another step on a ladder of capabilities, not something that requires completely different approach
You seem to be assuming ”ability” is single axis. It’s like assuming if we get 256 bit registers computers will start making coffee, or that going to the gym will eventually give you wings. There is nothing that suggests this. In fact, if you look at emerging ability in pattern matching that improved enormously, while seeing reasoning on novel problems sitting basically still, that suggests strongly that we are looking at a multi-axis problem domain.
> if you look at emerging ability in pattern matching that improved enormously, while seeing reasoning on novel problems sitting basically still
About two years ago I came to the opinion that autoregressive models of reasonable size will not be able to capture the fullness of human abilities (mostly due to a limited compute per token). So it's not a surprise to me. But training based on reinforcement learning might be able to overcome this.
I don't believe that some specialized mechanisms are required to do math.