← Back to context

Comment by himata4113

20 hours ago

That's why ARC-AGI-3 doesn't allow the use of a harnesses. The model has to create the harness instead.

Seems completely backwards to me. This is like judging Formula 1 just by the raw power of the engine. The rest of the car has just as much engineering, if not more.

  • ARC-AGI is testing raw intelligence, like the raw power of a Formula 1 engine. The rest of the car is the harness.

    • Maybe there is a complex relationship between harness, model and the emergent perceived intelligence we just can't access by isolating the model alone to evaluate "raw intelligence". I don't think it's absurd to imagine a model that by itself wouldn't be that impressive, but would outperform other models given the right harness. It's also not absurd to think of a model that has incredible raw intelligence, but would not scale much with different harnesses. Model performance given different scenarios depend a LOT on dataset and training strategies, so we need to account for these complex relationships, otherwise measuring "raw intelligence" would be the next AI benchmark that is purely for show.

The model is not allowed to create a harness either, I think.

  • it can, it just has to be within the same 'session', but it's mostly limited to scratch notes afaik since there's no python or bash, yah if there's no way to execute code there's no real way to build a harness.