Comment by maaaaattttt
7 days ago
I've said this somewhere else, but we have the perfect test for AGI in the form of any open world game. Give the instructions to the AGI that it should finish the game and how to control it. Give the frames as input and wait. When I think of the latest Zelda games and especially how the Shrine chanllenges are desgined they especially feel like the perfect environement for an AGI test.
And if someone makes a machine that does all that and another person says
"That's not really AGI because xyz"
What then? The difficulty in coming up with a test for AGI is coming up with something that people will accept a passing grade as AGI.
In many respects I feel like all of the claims that models don't really understand or have internal representation or whatever tend to lean on nebulous or circular definitions of the properties in question. Trying to pin the arguments down usually end up with dualism and/or religion.
Doing what Chollet has done is infinitely better, if a person can easily do something and a model cannot then there is clearly something significant missing
It doesn't matter what the property is or what it is called. Such tests might even help us see what those properties are.
Anyone who wants to claim the fundamental inability of these models should be able to provide a task that it is clearly possible to tell when it has been succeeded, and to show that humans can do it (if that's the bar we are claiming can't be met). If they are right, then no future model should be able to solve that class of problems.
Given your premise (which I agree with) I think the issue in general comes from the lack of a good, broadly accepted definition of what AGI is. My initial comment originates from the fact that in my internal definition, an AGI would have a de facto understanding of the physics of "our world". Or better, could infer them by trial and error. But, indeed, it doesn't have to be the case. (The other advantage of the Zelda games is that they introduce new abilities that don't exist in our world, and for which most children -I've seen- understand the mechanisms and how they could be applied to solve a problem quite naturaly even they've never had that ability before).
I'd say the issue is the lack of a good, broadly accepted definition of what I is. We all know "smart" when we see it, but actually defining it in a rigorous way is tough.
2 replies →
> It doesn't matter what the property is or what it is called. Such tests might even help us see what those properties are.
This is a very good point and somewhat novel to me in its explicitness.
There's no reason to think that we already have the concepts and terminology to point out the gaps between the current state and human-level intelligence and beyond. It's incredibly naive to think we have armchair-generated already those concepts by pure self-reflection and philosophizing. This is obvious in fields like physics. Experiments were necessary to even come up with the basic concepts of electromagnetism or relativity or quantum mechanics.
I think the reason is that pure philosophizing is still more prestigious than getting down in the weeds and dirt and doing limited-scope well-defined experiments on concrete things. So people feel smart by wielding poorly defined concepts like "understanding" or "reasoning" or "thinking", contrasting it with "mere pattern matching", a bit like the stalemate that philosophy as a field often hits, as opposed to the more pragmatic approach in the sciences, where empirical contact with reality allows more consensus and clarity without getting caught up in mere semantics.
> The difficulty in coming up with a test for AGI is coming up with something that people will accept a passing grade as AGI.
The difficulty with intelligence is we don't even know what it is in the first place (in a psychology sense, we don't even have a reliable model of anything that corresponds to what humans point at and call intelligence; IQ and g are really poor substitutes).
Add into that Goodhart's Law (essentially, propose a test as a metric for something, and people will optimize for the test rather than what the test is trying to measure), and it's really no surprise that there's no test for AGI.