Comment by spprashant
4 hours ago
I tend to prefer the ARC-AGI benchmarks for the most part. But it's always interesting when a new version drops, all the frontier models drop less than 20% or something. And then in the next few releases they get all they way up to 80%+. If you use the models it doesn't feel like those models are that much more generally intelligent.
Most frontier models are terrible at AGI-3 right now.
These models are already great no question, but are they really going be that much more intelligent when we hit 80% again?
No comments yet
Contribute on Hacker News ↗