Comment by dauhak
1 day ago
I think this is valid criticism, but it's also unclear how much this is an "inherent" shortcoming vs the kind of thing that's pretty reasonable given we're really seeing the first generation of this new model paradigm.
Like, I'm as sceptical of just assuming "line goes up" extrapolation of performance as much as anyone, but assuming that current flaws are going to continue being flaws seems equally wrong-headed/overconfident. The past 5 years or so has been a constant trail of these predictions being wrong (remember when people thought artists would be safe cos clearly AI just can't do hands?). Now that everyone's woken up to this RL approach we're probably going to see very quickly over the next couple years how much these issues hold up
(Really like the problem though, seems like a great test)
Yeah, that's a great point. While this is evidence that the sort of behavior LeCun predicted is currently displayed by some reasoning models, it would be going too far to say that it's evidence it will always be displayed. In fact, one could even have a more optimistic take - if models that do this can get 90+% on AIME and so on, imagine what a model that had ironed out these kinks could do with the same amount of thinking tokens. I feel like we'll just have to wait and see whether that pans out.