Comment by mNovak
8 days ago
I'm excited for the big jump in ARC-AGI scores from recent models, but no one should think for a second this is some leap in "general intelligence".
I joke to myself that the G in ARC-AGI is "graphical". I think what's held back models on ARC-AGI is their terrible spatial reasoning, and I'm guessing that's what the recent models have cracked.
Looking forward to ARC-AGI 3, which focuses on trial and error and exploring a set of constraints via games.
Agreed. I love the elegance of ARC, but it always felt like a gotcha to give spatial reasoning challenges to token generators- and the fact that the token generators are somehow beating it anyway really says something.
The average ARC AGI 2 score for a single human is around 60%.
"100% of tasks have been solved by at least 2 humans (many by more) in under 2 attempts. The average test-taker score was 60%."
https://arcprize.org/arc-agi/2/
Worth keeping in mind that in this case the test takers were random members of the general public. The score of e.g. people with bachelor's degrees in science and engineering would be significantly higher.
Random members of the public = average human beings. I thought those were already classified as General Intelligences.
1 reply →
What is the point of comparing performance of these tools to humans? Machines have been able to accomplish specific tasks better than humans since the industrial revolution. Yet we don't ascribe intelligence to a calculator.
None of these benchmarks prove these tools are intelligent, let alone generally intelligent. The hubris and grift are exhausting.
What's the point of denying or downplaying that we are seeing amazing and accelerating advancements in areas that many of us thought were impossible?
3 replies →
The hubris and grift are exhausting.
And moving the goalposts every few months isn't? What evidence of intelligence would satisfy you?
Personally, my biggest unsatisfied requirement is continual-learning capability, but it's clear we aren't too far from seeing that happen.
4 replies →
> Machines have been able to accomplish specific tasks...
Indeed, and the specific task machines are accomplishing now is intelligence. Not yet "better than human" (and certainly not better than every human) but getting closer.
3 replies →
Wouldn't you deal with spatial reasoning by giving it access to a tool that structures the space in a way it can understand or just is a sub-model that can do spatial reasoning? These "general" models would serve as the frontal cortex while other models do specialized work. What is missing?
That's a bit like saying just give blind people cameras so they can see.
I mean, no not really. These models can see, you're giving them eyes to connect to that part of their brain.
They should train more on sports commentary, perhaps that could give spatial reasoning a boost.