Comment by spprashant

6 hours ago

I tend to prefer the ARC-AGI benchmarks for the most part. But it's always interesting when a new version drops, all the frontier models drop less than 20% or something. And then in the next few releases they get all they way up to 80%+. If you use the models it doesn't feel like those models are that much more generally intelligent.

Most frontier models are terrible at AGI-3 right now.

These models are already great no question, but are they really going be that much more intelligent when we hit 80% again?