Comment by himata4113

20 hours ago

That's why ARC-AGI-3 doesn't allow the use of a harnesses. The model has to create the harness instead.

5 comments

himata4113

Seems completely backwards to me. This is like judging Formula 1 just by the raw power of the engine. The rest of the car has just as much engineering, if not more.

wyre 16 hours ago
ARC-AGI is testing raw intelligence, like the raw power of a Formula 1 engine. The rest of the car is the harness.
- gchamonlive 16 hours ago
  
  Maybe there is a complex relationship between harness, model and the emergent perceived intelligence we just can't access by isolating the model alone to evaluate "raw intelligence". I don't think it's absurd to imagine a model that by itself wouldn't be that impressive, but would outperform other models given the right harness. It's also not absurd to think of a model that has incredible raw intelligence, but would not scale much with different harnesses. Model performance given different scenarios depend a LOT on dataset and training strategies, so we need to account for these complex relationships, otherwise measuring "raw intelligence" would be the next AI benchmark that is purely for show.

vova_hn2 17 hours ago

The model is not allowed to create a harness either, I think.

himata4113 15 hours ago

it can, it just has to be within the same 'session', but it's mostly limited to scratch notes afaik since there's no python or bash, yah if there's no way to execute code there's no real way to build a harness.