← Back to context

Comment by cubefox

7 days ago

> Getting a high score on ARC doesn't mean we have AGI and Chollet has always said as much AFAIK

He only seems to say this recently, since OpenAI cracked the ARC-AGI benchmark. But in the original 2019 abstract he said this:

> We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.

https://arxiv.org/abs/1911.01547

Now he seems to backtrack, with the release of harder ARC-like benchmarks, implying that the first one didn't actually test for really general human-like intelligence.

This sounds a bit like saying that a machine beating chess would require general intelligence -- but then adding, after Deep Blue beats chess, that chess doesn't actually count as a test for AGI, and that Go is the real AGI benchmark. And after a narrow system beats Go, moving the goalpost to beating Atari, and then to beating StarCraft II, then to MineCraft, etc.

At some point, intuitively real "AGI" will be necessary to beat one of these increasingly difficult benchmarks, but only because otherwise yet another benchmark would have been invented. Which makes these benchmarks mostly post hoc rationalizations.

A better approach would be to question what went wrong with coming up with the very first benchmark, and why a similar thing wouldn't occur with the second.

Relative to humans, these models sure have ungodly amounts of knowledge, but they also kinda have a lobotomy, in never having moved through the world. It’s remarkable they work as well as they do trained chiefly on text, but being so untethered from the only reality we know intelligence to have emerged from... frankly, what do we expect?