Comment by nxobject

7 days ago

I understand Chollet is transparent that the "branding" of the ARC-AGI-n suites is meant to be suggestive of its purpose, than substantial.

However, it does rub me the wrong way - as someone who's cynical of how branding can enable breathless AI hype by bad journalism. A hypothetical comparison would be labelling SHRDLU's (1968) performance on Block World planning tasks as "ARC-AGI-(-1)".[0]

A less loaded name like (bad strawman option) "ARC-VeryToughSymbolicReasoning" should capture how the ARC-AGI-n suite is genuinely and intrinsically very hard for current AIs, and what progress satisfactory performance on the benchmark suite would represent. Which Chollet has done, and has grounded him throughout! [1]

[0] https://en.wikipedia.org/wiki/SHRDLU [1] https://arxiv.org/abs/1911.01547

I get what you're saying about perception being reality and that ARC-AGI suggests beating it means AGI has been achieved.

In practice when I have seen ARC brought up, it has more nuance than any of the other benchmarks.

Unlike, Humanity's Last Exam, which is the most egregious example I have seen in naming and when it is referenced in terms of an LLMs capability.