Comment by TheAceOfHearts

7 days ago

Getting a high score on ARC doesn't mean we have AGI and Chollet has always said as much AFAIK, it's meant to push the AI research space in a positive direction. Being able to solve ARC problems is probably a pre-requisite to AGI. It's a directional push into the fog of war, with the claim being that we should explore that area because we expect it's relevant to building AGI.

We don't really have a true test that means "if we pass this test we have AGI" but we have a variety of tests (like ARC) that we believe any true AGI would be able to pass. It's a "necessary but not sufficient" situation. Also ties directly to the challenge in defining what AGI really means. You see a lot of discussions of "moving the goal posts" around AGI, but as I see it we've never had goal posts, we've just got a bunch of lines we'd expect to cross before reaching them.

  • I don't think we actually even have a good definition of "This is what AGI is, and here are the stationary goal posts that, when these thresholds are met, then we will have AGI".

    If you judged human intelligence by our AI standards, then would humans even pass as Natural General Intelligence? Human intelligence tests are constantly changing, being invalidated, and rerolled as well.

    I maintain that today's modern LLMs would pass sufficiently for AGI and is also very close to passing a Turing Test, if measured in 1950 when the test was proposed.

    • >I don't think we actually even have a good definition of "This is what AGI is, and here are the stationary goal posts that, when these thresholds are met, then we will have AGI".

      Not only do we not have that, I don't think it's possible to have it.

      Philosophers have known about this problem for centuries. Wittgenstein recognized that most concepts don't have precise definitions but instead behave more like family resemblances. When we look at a family we recognize that they share physical characteristics, even if there's no single characteristic shared by all of them. They don't need to unanimously share hair color, skin complexion, mannerisms, etc. in order to have a family resemblance.

      Outside of a few well-defined things in logic and mathematics, concepts operate in the same way. Intelligence isn't a well-defined concept, but that doesn't mean we can't talk about different types of human intelligence, non-human animal intelligence, or machine intelligence in terms of family resemblances.

      Benchmarks are useful tools for assessing relative progress on well-defined tasks. But the decision of what counts as AGI will always come down to fuzzy comparisons and qualitative judgments.

    • The current definition and goal of AGI is “Artificial intelligence good enough to replace every employee for cheaper” and much of the difficulty people have in defining it is cognitive dissonance about the goal.

      3 replies →

    • Because an important part of being a Natural general Intelligence is having a body and interacting with the world. Data from Star Trek is a good example of an AGI.

      1 reply →

    • Turing test is not really that meaningful anymore because you can always detect the AI by text and timing patterns rather than actual intelligence. In fact the most reliable way to test for AI is probably to ask trivia questions on various niche topics, I don't think any human has as much breath of general knowledge as current AIs.

      3 replies →

  • One of the very first slides of François’ presentation is about defining AGI. Do you have anything that opposes his synthesis of the two (50 years old) takes on this definition?

  • I have graduated with a degree in Software engineering and i am bilingual (Bulgarian and English). Currently AI is better than me in everything except adding big numbers or writing code in really niche topics - for example code golfing a Brainfuck interpreter or writing a Rubiks cube solver. I believe AGI has been here for at least a year now.

    • I suggest you to try to let the AI think through race conditions scenarios in asynchronous programs; it is not that good at these abstract reasoning tasks.

      1 reply →

    • Can the AI wash your dishes, fold your laundry, take out your trash, meet a friend for dinner or the other thousand things you might do in an average day when you're not interacting with text on a screen?

      You know stuff that humans have done way before there were computers and screens.

      2 replies →

> Getting a high score on ARC doesn't mean we have AGI and Chollet has always said as much AFAIK

He only seems to say this recently, since OpenAI cracked the ARC-AGI benchmark. But in the original 2019 abstract he said this:

> We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.

https://arxiv.org/abs/1911.01547

Now he seems to backtrack, with the release of harder ARC-like benchmarks, implying that the first one didn't actually test for really general human-like intelligence.

This sounds a bit like saying that a machine beating chess would require general intelligence -- but then adding, after Deep Blue beats chess, that chess doesn't actually count as a test for AGI, and that Go is the real AGI benchmark. And after a narrow system beats Go, moving the goalpost to beating Atari, and then to beating StarCraft II, then to MineCraft, etc.

At some point, intuitively real "AGI" will be necessary to beat one of these increasingly difficult benchmarks, but only because otherwise yet another benchmark would have been invented. Which makes these benchmarks mostly post hoc rationalizations.

A better approach would be to question what went wrong with coming up with the very first benchmark, and why a similar thing wouldn't occur with the second.

ARC is definitely about achieving AGI and it doesn't matter whether we "have" it or not right now. That is the goal:

> where he introduced the "Abstract and Reasoning Corpus for Artificial General Intelligence" (ARC-AGI) benchmark to measure intelligence

So, a high enough score is a threshold to claim AGI. And, if you use an LLM to work these types of problems, it becomes pretty clear that passing more tests indicates a level of "awareness" that goes beyond rational algorithms.

I thought I had seen everything until I started working on some of the problems with agents. I'm still sorta in awe about how the reasoning manifests. (And don't get me wrong, LLMs like Claude still go completely off the rails where even a less intelligent human would know better.)

"Being able to solve ARC problems is probably a pre-requisite to AGI." - is it? Humans have general intelligence and most can't solve the harder ARC problems.

  • https://arcprize.org/leaderboard

    "Avg. Mturker" has 77% on ARC1 and costs $3/task. "Stem Grad" has 98% on ARC1 and costs $10/task. I would love a segment like "typical US office employee" or something else in between since I don't think you need a stem degree to do better than 77%.

    It's also worth noting the "Human Panel" gets 100% on ARC2 at $17/task. All the "Human" models are on the score/cost frontier and exceptional in their score range although too expensive to win the prize obviously.

    I think the real argument is that the ARC problems are too abstract and obscure to be relevant to useful AGI, but I think we need a little flexibility in that area so we can have tests that can be objectively and mechanically graded. E.g. "write a NYT bestseller" is an impractical test in many ways even if it's closer to what AGI should be.

    • > I think the real argument is that the ARC problems are too abstract and obscure to be relevant to useful AGI

      I think it's meant to work like how getting things off the top shelf at the supermarket isn't relevant to playing basketball.

  • They, and the other posters posting similar things, don't mean human-like intelligence, or even the rigorously defined solving of unconstrained problem spaces that originally defined Artificial General Intelligence (in contrast to "narrow" intelligence").

    They mean an artificial god, and it has become a god of the gaps: we have made artificial general intelligence, and it is more human-like than god-like, and so to make a god we must have it do XYZ precisely because that is something which people can't do.

My problem with AGI is the lack of a simple, concrete definition.

Can we formalize it as giving out a task expressible in, say, n^m bytes of information that encodes a task of n^(m+q) real algorithmic and verification complexity -- then solving that task within a certain time, compute, and attempt bounds?

Something that captures "the AI was able to unwind the underlying unspoken complexity of the novel problem".

I feel like one could map a variety of easy human "brain teaser" type tasks to heuristics that fit within some mathematical framework and then grow the formalism from there.

  • >My problem with AGI is the lack of a simple, concrete definition.

    You can't always start from definitions. There are many research areas where the object of research is to know something well enough that you could converge on such a thing as a definition, e.g. dark matter, consciousness, intelligence, colony collapse syndrome, SIDS. We nevertheless can progress in our understanding of them in a whole motley of strategic ways, by case studies that best exhibit salient properties, trace the outer boundaries of the problem space, track the central cluster of "family resemblances" that seem to characterize the problem, entertain candidate explanations that are closer or further away, etc. Essentially a practical attitude.

    I don't doubt in principle that we could arrive at such a thing as a definition that satisfies most people, but I suspect you're more likely to have that at the end than the beginning.

  • one of those cases where defining it and solving it is the same. If you know how to define it then you've solved it.

  • After researching this a fair amount, my opinion is that consciousness/intelligence (can you have one without the other?) emerges from some sort of weird entropy exchange in domains in the brain. The theory goes that we aren't conscious, but we DO consciousness, sometimes. Maybe entropy, or the inverse of it, gives way to intelligence, somehow.

    This entropy angle has real theoretical backing. Some researchers propose consciousness emerges from the brain's ability to integrate information across different scales and timeframes. This would essentially create temporary "islands of low entropy" in neural networks. Giulio Tononi's Integrated Information Theory suggests consciousness corresponds to a system's ability to generate integrated information, which relates to how it reduces uncertainty (entropy) about its internal states. Then there is Hammeroff and Penrose, which I commented about on here years ago and got blasted for it. Meh. I'm a learner, and I learn by entertaining truths. But I always remain critical of theories until I'm sold.

    I'm not selling any of this as a truth, because the fact remains we have no idea what "consciousness" is. We have a better handle on "intelligence", but as others point out, most humans aren't that intelligent. They still manage to drive to the store and feed their dogs, however.

    A lot of the current leading ARC solutions use random sampling, which sorta makes sense once you start thinking about having to handle all the different types of problems. At least it seems to be helping out in paring down the decision tree.

I'm all for benchmarks that push the field forward, but ARC problems seem to be difficult for reasons having less to do with intelligence and more about having a text system that works reliably with rasterized pixel data presented line by line. Most people would score 0 on it if they were shown the data the way an LLM sees it, these problems only seem easy to us because there are visualizers slapped on top.

  • What is a visualisation?

    Our rod and cone cells could just as well be wired up in any other configuration you care to imagine. And yet, an organisation or mapping that preserves spatial relationships has been strongly preferred over billions of years of evolution, allowing us most easily to make sense of the world. Put another way, spatial feature detectors have emerged as an incredible versatile substrate for ‘live-action’ generation of world models.

    What do we do when we visualise, then? We take abstract relationships (in data, in a conceptual framework, whatever) and map them in a structure-preserving way to an embodiment (ink on paper, pixels on screen) that can wind its way through our perceptual machinery that evolved to detect spatial relationships. That is, we leverage our highly developed capability for pattern matching in the visual domain to detect patterns that are not necessarily visual at all, but which nevertheless have some inherent structure that is readily revealed that way.

    What does any of this entail for machine intelligence?

    On the one hand, if a problem has an inherent spatial logic to it, then it ought to have good learning gradients in the direction of a spatial organisation of the raw input. So, if specifically training for such a problem, the serialisation probably doesn’t much matter.

    On the other hand: expecting a language model to generalise to inherently spatial reasoning? I’m totally with you. Why should we expect good performance?

    No clue how the unification might be achieved, but I’d wager that language + action-prediction models will be far more capable than models grounded in language alone. After all, what does ‘cat’ mean to a language model that’s never seen one pounce and purr and so on? (Pictures don’t really count.)