Comment by grzracz

17 hours ago

Seems completely backwards to me. This is like judging Formula 1 just by the raw power of the engine. The rest of the car has just as much engineering, if not more.

2 comments

grzracz

wyre 16 hours ago

ARC-AGI is testing raw intelligence, like the raw power of a Formula 1 engine. The rest of the car is the harness.

gchamonlive 16 hours ago

Maybe there is a complex relationship between harness, model and the emergent perceived intelligence we just can't access by isolating the model alone to evaluate "raw intelligence". I don't think it's absurd to imagine a model that by itself wouldn't be that impressive, but would outperform other models given the right harness. It's also not absurd to think of a model that has incredible raw intelligence, but would not scale much with different harnesses. Model performance given different scenarios depend a LOT on dataset and training strategies, so we need to account for these complex relationships, otherwise measuring "raw intelligence" would be the next AI benchmark that is purely for show.