Comment by qoez

7 days ago

I feel like I'm the only one who isn't convinced getting a high score on the ARC eval test means we have AGI. It's mostly about pattern matching (and some of it ambiguous even for humans what the actual true response aught to be). It's like how in humans there's lots of different 'types' of intelligence, and just overfitting on IQ tests doesn't in my mind convince me a person is actually that smart.

127 comments

qoez

TheAceOfHearts 7 days ago

Getting a high score on ARC doesn't mean we have AGI and Chollet has always said as much AFAIK, it's meant to push the AI research space in a positive direction. Being able to solve ARC problems is probably a pre-requisite to AGI. It's a directional push into the fog of war, with the claim being that we should explore that area because we expect it's relevant to building AGI.

lostphilosopher 7 days ago
We don't really have a true test that means "if we pass this test we have AGI" but we have a variety of tests (like ARC) that we believe any true AGI would be able to pass. It's a "necessary but not sufficient" situation. Also ties directly to the challenge in defining what AGI really means. You see a lot of discussions of "moving the goal posts" around AGI, but as I see it we've never had goal posts, we've just got a bunch of lines we'd expect to cross before reaching them.
- MPSimmons 7 days ago
  
  I don't think we actually even have a good definition of "This is what AGI is, and here are the stationary goal posts that, when these thresholds are met, then we will have AGI".
  If you judged human intelligence by our AI standards, then would humans even pass as Natural General Intelligence? Human intelligence tests are constantly changing, being invalidated, and rerolled as well.
  I maintain that today's modern LLMs would pass sufficiently for AGI and is also very close to passing a Turing Test, if measured in 1950 when the test was proposed.
  
  12 replies →
- batmansmk 6 days ago
  
  One of the very first slides of François’ presentation is about defining AGI. Do you have anything that opposes his synthesis of the two (50 years old) takes on this definition?
- tedy1996 7 days ago
  
  I have graduated with a degree in Software engineering and i am bilingual (Bulgarian and English). Currently AI is better than me in everything except adding big numbers or writing code in really niche topics - for example code golfing a Brainfuck interpreter or writing a Rubiks cube solver. I believe AGI has been here for at least a year now.
  
  5 replies →
cubefox 7 days ago
> Getting a high score on ARC doesn't mean we have AGI and Chollet has always said as much AFAIK
He only seems to say this recently, since OpenAI cracked the ARC-AGI benchmark. But in the original 2019 abstract he said this:
> We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.
https://arxiv.org/abs/1911.01547
Now he seems to backtrack, with the release of harder ARC-like benchmarks, implying that the first one didn't actually test for really general human-like intelligence.
This sounds a bit like saying that a machine beating chess would require general intelligence -- but then adding, after Deep Blue beats chess, that chess doesn't actually count as a test for AGI, and that Go is the real AGI benchmark. And after a narrow system beats Go, moving the goalpost to beating Atari, and then to beating StarCraft II, then to MineCraft, etc.
At some point, intuitively real "AGI" will be necessary to beat one of these increasingly difficult benchmarks, but only because otherwise yet another benchmark would have been invented. Which makes these benchmarks mostly post hoc rationalizations.
A better approach would be to question what went wrong with coming up with the very first benchmark, and why a similar thing wouldn't occur with the second.
- whattheheckheck 7 days ago
  
  Okay, true AGI would solve the coordination problem for all humans everywhere and usher in a post scarcity utopia. https://slatestarcodex.com/2014/07/30/meditations-on-moloch/
  We can simply check the news every day until it's built...
  
  2 replies →
- solarwindy 6 days ago
  
  Relative to humans, these models sure have ungodly amounts of knowledge, but they also kinda have a lobotomy, in never having moved through the world. It’s remarkable they work as well as they do trained chiefly on text, but being so untethered from the only reality we know intelligence to have emerged from... frankly, what do we expect?
kordlessagain 7 days ago
ARC is definitely about achieving AGI and it doesn't matter whether we "have" it or not right now. That is the goal:
> where he introduced the "Abstract and Reasoning Corpus for Artificial General Intelligence" (ARC-AGI) benchmark to measure intelligence
So, a high enough score is a threshold to claim AGI. And, if you use an LLM to work these types of problems, it becomes pretty clear that passing more tests indicates a level of "awareness" that goes beyond rational algorithms.
I thought I had seen everything until I started working on some of the problems with agents. I'm still sorta in awe about how the reasoning manifests. (And don't get me wrong, LLMs like Claude still go completely off the rails where even a less intelligent human would know better.)
- MPSimmons 7 days ago
  
  >a high enough score is a threshold to claim AGI
  I'm pretty sure he said that AGI would achieve a high score, not that a high score was indicative of AGI
- smohare 7 days ago
  
  [dead]
ummonk 7 days ago
"Being able to solve ARC problems is probably a pre-requisite to AGI." - is it? Humans have general intelligence and most can't solve the harder ARC problems.
- singron 7 days ago
  
  https://arcprize.org/leaderboard
  "Avg. Mturker" has 77% on ARC1 and costs $3/task. "Stem Grad" has 98% on ARC1 and costs $10/task. I would love a segment like "typical US office employee" or something else in between since I don't think you need a stem degree to do better than 77%.
  It's also worth noting the "Human Panel" gets 100% on ARC2 at $17/task. All the "Human" models are on the score/cost frontier and exceptional in their score range although too expensive to win the prize obviously.
  I think the real argument is that the ARC problems are too abstract and obscure to be relevant to useful AGI, but I think we need a little flexibility in that area so we can have tests that can be objectively and mechanically graded. E.g. "write a NYT bestseller" is an impractical test in many ways even if it's closer to what AGI should be.
  
  1 reply →
- adastra22 7 days ago
  
  They, and the other posters posting similar things, don't mean human-like intelligence, or even the rigorously defined solving of unconstrained problem spaces that originally defined Artificial General Intelligence (in contrast to "narrow" intelligence").
  They mean an artificial god, and it has become a god of the gaps: we have made artificial general intelligence, and it is more human-like than god-like, and so to make a god we must have it do XYZ precisely because that is something which people can't do.
  
  2 replies →
- satellite2 7 days ago
  
  Didn't he say that 70% in a random sample of the population should get it right?
echelon 7 days ago
My problem with AGI is the lack of a simple, concrete definition.
Can we formalize it as giving out a task expressible in, say, n^m bytes of information that encodes a task of n^(m+q) real algorithmic and verification complexity -- then solving that task within a certain time, compute, and attempt bounds?
Something that captures "the AI was able to unwind the underlying unspoken complexity of the novel problem".
I feel like one could map a variety of easy human "brain teaser" type tasks to heuristics that fit within some mathematical framework and then grow the formalism from there.
- glenstein 7 days ago
  
  >My problem with AGI is the lack of a simple, concrete definition.
  You can't always start from definitions. There are many research areas where the object of research is to know something well enough that you could converge on such a thing as a definition, e.g. dark matter, consciousness, intelligence, colony collapse syndrome, SIDS. We nevertheless can progress in our understanding of them in a whole motley of strategic ways, by case studies that best exhibit salient properties, trace the outer boundaries of the problem space, track the central cluster of "family resemblances" that seem to characterize the problem, entertain candidate explanations that are closer or further away, etc. Essentially a practical attitude.
  I don't doubt in principle that we could arrive at such a thing as a definition that satisfies most people, but I suspect you're more likely to have that at the end than the beginning.
- apwell23 7 days ago
  
  one of those cases where defining it and solving it is the same. If you know how to define it then you've solved it.
- kordlessagain 7 days ago
  
  After researching this a fair amount, my opinion is that consciousness/intelligence (can you have one without the other?) emerges from some sort of weird entropy exchange in domains in the brain. The theory goes that we aren't conscious, but we DO consciousness, sometimes. Maybe entropy, or the inverse of it, gives way to intelligence, somehow.
  This entropy angle has real theoretical backing. Some researchers propose consciousness emerges from the brain's ability to integrate information across different scales and timeframes. This would essentially create temporary "islands of low entropy" in neural networks. Giulio Tononi's Integrated Information Theory suggests consciousness corresponds to a system's ability to generate integrated information, which relates to how it reduces uncertainty (entropy) about its internal states. Then there is Hammeroff and Penrose, which I commented about on here years ago and got blasted for it. Meh. I'm a learner, and I learn by entertaining truths. But I always remain critical of theories until I'm sold.
  I'm not selling any of this as a truth, because the fact remains we have no idea what "consciousness" is. We have a better handle on "intelligence", but as others point out, most humans aren't that intelligent. They still manage to drive to the store and feed their dogs, however.
  A lot of the current leading ARC solutions use random sampling, which sorta makes sense once you start thinking about having to handle all the different types of problems. At least it seems to be helping out in paring down the decision tree.
cornel_io 6 days ago
I'm all for benchmarks that push the field forward, but ARC problems seem to be difficult for reasons having less to do with intelligence and more about having a text system that works reliably with rasterized pixel data presented line by line. Most people would score 0 on it if they were shown the data the way an LLM sees it, these problems only seem easy to us because there are visualizers slapped on top.
- solarwindy 6 days ago
  
  What is a visualisation?
  Our rod and cone cells could just as well be wired up in any other configuration you care to imagine. And yet, an organisation or mapping that preserves spatial relationships has been strongly preferred over billions of years of evolution, allowing us most easily to make sense of the world. Put another way, spatial feature detectors have emerged as an incredible versatile substrate for ‘live-action’ generation of world models.
  What do we do when we visualise, then? We take abstract relationships (in data, in a conceptual framework, whatever) and map them in a structure-preserving way to an embodiment (ink on paper, pixels on screen) that can wind its way through our perceptual machinery that evolved to detect spatial relationships. That is, we leverage our highly developed capability for pattern matching in the visual domain to detect patterns that are not necessarily visual at all, but which nevertheless have some inherent structure that is readily revealed that way.
  What does any of this entail for machine intelligence?
  On the one hand, if a problem has an inherent spatial logic to it, then it ought to have good learning gradients in the direction of a spatial organisation of the raw input. So, if specifically training for such a problem, the serialisation probably doesn’t much matter.
  On the other hand: expecting a language model to generalise to inherently spatial reasoning? I’m totally with you. Why should we expect good performance?
  No clue how the unification might be achieved, but I’d wager that language + action-prediction models will be far more capable than models grounded in language alone. After all, what does ‘cat’ mean to a language model that’s never seen one pounce and purr and so on? (Pictures don’t really count.)
varelse 6 days ago

[dead]
autobodie 7 days ago
[flagged]

davidclark 7 days ago

In the video, François Chollet, creator of the ARC benchmarks, says that beating ARC does not equate to AGI. He specifically says they will be able to be beaten without AGI.

cubefox 7 days ago
He only says this because otherwise he would have to say that
- OpenAI's o3 counts as "AGI" when it did unexpectedly beat the ARC-AGI benchmark or
- Explicitly admit that he was wrong when assuming that ARC-AGI would test for AGI
- sweezyjeezy 7 days ago
  
  FWIW the original ARC was published in 2019, just after GPT-2 but a while before GPT-3. I work in the field, I think that discussing AGI seriously is actually kind of a recent thing (I'm not sure I ever heard the term 'AGI' until a few years ago). I'm not saying I know he didn't feel that, but he doesn't talk in such terms in the original paper.
  
  3 replies →

yorwba 7 days ago

I think the people behind the ARC Prize agree that getting a high score doesn't mean we have AGI. (They already updated the benchmark once to make it harder.) But an AGI should get a similarly high score as humans do. So current models that get very low scores are definitely not AGI, and likely quite far away from it.

cubefox 7 days ago
> I think the people behind the ARC Prize agree that getting a high score doesn't mean we have AGI
The benchmark was literally called ARC-AGI. Only after OpenAI cracked it, they started backtracking and saying that it doesn't test for true AGI. Which undermines the whole premise of a benchmark.
- yorwba 7 days ago
  
  It does test for AGI. For its absence.
  
  1 reply →

energy123 7 days ago

https://en.m.wikipedia.org/wiki/AI_effect

But on a serious note, I don't think Chollet would disagree. ARC is a necessary but not sufficient condition, and he says that, despite the unfortunate attention-grabbing name choice of the benchmark. I like Chollet's view that we will know that AGI is here when we can't come up with new benchmarks that separate humans from AI.

gonzobonzo 7 days ago

I agree with you but I'll go a step further - these benchmarks are a good example of how far we are from AGI.

A good base test would be to give a manager a mixed team of remote workers, half being human and half being AI, and seeing if the manager or any of the coworkers would be able to tell the difference. We wouldn't be able to say that AI that passed that test would necessarily be AGI, since we would have to test it in other situations. But we could say that AI that couldn't pass that test wouldn't qualify, since it wouldn't be able to successfully accomplish some tasks that humans are able to.

But of course, current AI is nowhere near that level yet. We're left with benchmarks, because we all know how far away we are from actual AGI.

criddell 7 days ago
The AGI test I think makes sense is to put it in a robot body and let it navigate the world. Can I take the robot to my back yard and have it weed my vegetable garden? Can I show it how to fold my laundry? Can I take it to the grocery store and tell it "go pick up 4 yellow bananas and two avocados that will be ready to eat in the next day or two, and then meet me in dairy"? Can I ask it to dice an onion for me during meal prep?
These are all things my kids would do when they were pretty young.
- gonzobonzo 7 days ago
  
  I agree, I think of that as the next level beyond the digital assistant test - a physical assistant test. Once there are sufficiently capable robots, hook one up to the AI. Tell it to mow your lawn, drive your car to the mechanic and have the mechanic to get checked, box up an item, take it to the post office, and have it shiped, pick up your dry cleaning, buy ingredients from a grocery store, cook dinner, etc. Basic tasks an low-skilled worker would do as someone's assistant.
- bumby 7 days ago
  
  I think the next harder level in AGI testing would be “convince my kids to weed the garden and fold the laundry” :-)
  
  1 reply →
godshatter 7 days ago
The problem with "spot the difference" tests, imho, is that I would expect an AGI to be easily spotted. There's going to be a speed of calculation difference, at the very least. If nothing else, typing speed would be completely different unless the AGI is supposed to be deceptive. Who knows what it's personality would be like. I'd say it's a simple enough test just to see if an AGI could be hired as, for example, an entry level software developer and keep it's job based on the same criteria base-level humans have to meet.
I agree that current AI is nowhere near that level yet. If AI isn't even trying to extract meaning from the words it smiths or the pictures it diffuses then it's nothing more than a cute (albeit useful) parlor trick.
- gonzobonzo 7 days ago
  
  Those could probably be mitigated pretty easily in testing situations. For example, making sure all participants had a delay in chat conversations, or running correspondence through an LLM to equalize the personality.
  However, I'm not sure an AGI test should be mitigating them. If an AI isn't able to communicate at human speeds, or isn't able to achieve the social understandings that a human does, it would probably be wrong to say that it has the same intelligence capabilities as a human (how AGI has traditionally been defined). It wouldn't be able to provide human level performance in many jobs.
svantana 6 days ago

Why even bother with the people in the mix? Just tell the AGI: make as much money as you can in 6 months. Preferably without breaking any laws.

crazylogger 7 days ago

I think next year's AI benchmarks are going to be like this project: https://www.anthropic.com/research/project-vend-1

Give the AI tools and let it do real stuff in the world:

"FounderBench": Ask the AI to build a successful business, whatever that business may be - the AI decides. Maybe try to get funded by YC - hiring a human presenter for Demo Day is allowed. They will be graded on profit / loss, and valuation.

Testing plain LLM on whiteboard-style question is meaningless now. Going forward, it will all be multi-agent systems with computer use, long-term memory & goals, and delegation.

Miraltar 7 days ago

This sounds like a terrible idea to me, you're training intelligent computer to aim for power. It's fine as long as they're bad but if they get good then we have a problem

ben_w 7 days ago

You're not alone in this; I expect us to have not yet enumerated all the things that we ourselves mean by "intelligence".

But conversely, not passing this test is a proof of not being as general as a human's intelligence.

kypro 7 days ago
I find the "what is intelligence?" discussion a little pointless if I'm honest. It's similar to asking a question like does it mean to be a "good person" and would we know whether an AI or person is really "good"?
While understanding why a person or AI is doing what it's doing can be important (perhaps specifically in safety contexts) at the end of the day all that's really going to matter to most people is the outcomes.
So if an AI can use what appears to be intelligence to solve general problems and can act in ways that are broadly good for society, whether or not it meets some philosophical definition of "intelligent" or "good" doesn't matter much – at least in most contexts.
That said, my own opinion on this is that the truth is likely in between. LLMs today seem extremely good at being glorified auto-completes, and I suspect most (95%+) of what they do is just recalling patterns in their weights. But unlike traditional auto-completes they do seem to have some ability to reason and solve truly novel problems. As it stands I'd argue that ability is fairly poor, but this might only represent 1-2% of what we use intelligence for.
If I were to guess why this is I suspect it's not that LLM architecture today is completely wrong, but that the way LLMs are trained means that in general knowledge recall is rewarded more than reasoning. This is similar to the trade-off we humans have with education – do you prioritise the acquisition of knowledge or critical thinking? Maybe believe critical thinking is more important and should be prioritised more, but I suspect for the vast majority of tasks we're interested in solving knowledge storage and recall is actually more important.
- ben_w 7 days ago
  
  That's certainly a valid way of looking at their abilities at any given task — "The question of whether a computer can think is no more interesting than the question of whether a submarine can swim".
  But when the question is "are they going to more important to the economy than humans?", then they have to be good at basically everything a human can do, otherwise we just see a variant of Amdahl's law in action and the AI perform an arbitrary speed-up of n % of the economy while humans are needed for the remaining 100-n %.
  I may be wrong, but it seems to me that the ARC prize is more about the latter.
  
  3 replies →
NetRunnerSu 7 days ago

Unfortunately, we did it. All that is left is to assemble the parts.
https://news.ycombinator.com/item?id=44488126

cttet 7 days ago

The point is not that having a high score -> AGI, their ideas are more of having a low score -> we don't have AGI yet.

kubb 7 days ago

AGI isn't defined anywhere, so it can be anything you want.

mindcrime 7 days ago

Oh, it's defined in lots of places. The problem is.. it's defined in lots of places!
FrustratedMonky 7 days ago

Yes. And a lot of humans also don't pass for having AGI.

avmich 7 days ago

Roughly speaking, the job of a medical doctor is to diagnose the patient - and then, after the diagnosis is made, to apply the healing from the book, corresponding to the diagnosis.

The diagnosis is pattern matching (again, roughly). It kinda suggests that a lot of "intelligent" problems are focused on pattern matching, and (relatively straightforward) application of "previous experience". So, pattern matching can bring us a great deal towards AGI.

AnimalMuppet 7 days ago

Pattern matching is instinct. (Or at least, instinct is a kind of pattern matching. And once you learn the patterns, pattern matching can become almost instinctual). And that's fine, for things that fit the pattern. But a human-level intelligence can also deal with problems for which there is no pattern. (I mean, not always successfully - finding a correct solution to a novel problem is difficult. But it is within the capability of at least some humans.)

whiplash451 7 days ago

You're not the only one. ARC-AGI is a laudable effort, but its fundamental premise is indeed debatable:

"We argue that human cognition follows strictly the same pattern as human physical capabilities: both emerged as evolutionary solutions to specific problems in specific evironments" (from page 22 of On the Measure of Intelligence)

https://arxiv.org/pdf/1911.01547

Davidzheng 7 days ago

But I believe that because of this "even edge" thing which people call of AI weakenesses being not necessarily same of humans, once we run out of these tests which AI is worse than humans it will actually in effect be very much superhuman. My main evidence for this is leela-zero the Go AI who struggled with ladders and some other aspects of Go play well into the superhuman regime (in go it's easier to see when it's superhuman bc you can have elos and play win-rates etc and there's less room for debates)

jxjnskkzxxhx 5 days ago

> I feel like I'm the only one who isn't convinced getting a high score on the ARC eval test means we have AGI

Francois explicitly says that's not how ARC is supposed to be interpreted.

SubiculumCode 7 days ago

[1]https://app.rescript.info/public/share/W_T7E1OC2Wj49ccqlIOOz...

Perhaps it's because the representations are fractured. The link above is to the transcript of an episode of Machine Learning Street Talk with Kenneth O. Stanleyabout The Fractured Entangled Representation Hypothesis[1]

cainxinth 7 days ago

> It's mostly about pattern matching...

For all we know, human intelligence is just an emergent property of really good pattern matching.

CamperBob2 7 days ago

If you can write code to solve ARC by "overfitting," then give it a shot! There's prize money to be won, as long as your model does a good job on the hidden test set. Zuckerberg is said to be throwing around 8-figure signing bonuses for talent like that.

But then, I guess it wouldn't be "overfitting" after all, would it?

andoando 7 days ago

Who says intelligence is anything more than "pattern matching"? Everything is patterns

tippytippytango 7 days ago

He’s playing the game. You have to say AGI is your goal to get attention. It’s just like the YouTube thumbnail game. You can hate it, but you still have to play if you want people to pay attention.

cess11 7 days ago

Much like other forms of psychometry, especially related to so called intelligence, it's mainly about stratification and discrimination for ideological purposes.

nxobject 7 days ago

I understand Chollet is transparent that the "branding" of the ARC-AGI-n suites is meant to be suggestive of its purpose, than substantial.

However, it does rub me the wrong way - as someone who's cynical of how branding can enable breathless AI hype by bad journalism. A hypothetical comparison would be labelling SHRDLU's (1968) performance on Block World planning tasks as "ARC-AGI-(-1)".[0]

A less loaded name like (bad strawman option) "ARC-VeryToughSymbolicReasoning" should capture how the ARC-AGI-n suite is genuinely and intrinsically very hard for current AIs, and what progress satisfactory performance on the benchmark suite would represent. Which Chollet has done, and has grounded him throughout! [1]

[0] https://en.wikipedia.org/wiki/SHRDLU [1] https://arxiv.org/abs/1911.01547

heymijo 7 days ago

I get what you're saying about perception being reality and that ARC-AGI suggests beating it means AGI has been achieved.
In practice when I have seen ARC brought up, it has more nuance than any of the other benchmarks.
Unlike, Humanity's Last Exam, which is the most egregious example I have seen in naming and when it is referenced in terms of an LLMs capability.

maaaaattttt 7 days ago

I've said this somewhere else, but we have the perfect test for AGI in the form of any open world game. Give the instructions to the AGI that it should finish the game and how to control it. Give the frames as input and wait. When I think of the latest Zelda games and especially how the Shrine chanllenges are desgined they especially feel like the perfect environement for an AGI test.

Lerc 7 days ago
And if someone makes a machine that does all that and another person says
"That's not really AGI because xyz"
What then? The difficulty in coming up with a test for AGI is coming up with something that people will accept a passing grade as AGI.
In many respects I feel like all of the claims that models don't really understand or have internal representation or whatever tend to lean on nebulous or circular definitions of the properties in question. Trying to pin the arguments down usually end up with dualism and/or religion.
Doing what Chollet has done is infinitely better, if a person can easily do something and a model cannot then there is clearly something significant missing
It doesn't matter what the property is or what it is called. Such tests might even help us see what those properties are.
Anyone who wants to claim the fundamental inability of these models should be able to provide a task that it is clearly possible to tell when it has been succeeded, and to show that humans can do it (if that's the bar we are claiming can't be met). If they are right, then no future model should be able to solve that class of problems.
- maaaaattttt 7 days ago
  
  Given your premise (which I agree with) I think the issue in general comes from the lack of a good, broadly accepted definition of what AGI is. My initial comment originates from the fact that in my internal definition, an AGI would have a de facto understanding of the physics of "our world". Or better, could infer them by trial and error. But, indeed, it doesn't have to be the case. (The other advantage of the Zelda games is that they introduce new abilities that don't exist in our world, and for which most children -I've seen- understand the mechanisms and how they could be applied to solve a problem quite naturaly even they've never had that ability before).
  
  3 replies →
- bonoboTP 7 days ago
  
  > It doesn't matter what the property is or what it is called. Such tests might even help us see what those properties are.
  This is a very good point and somewhat novel to me in its explicitness.
  There's no reason to think that we already have the concepts and terminology to point out the gaps between the current state and human-level intelligence and beyond. It's incredibly naive to think we have armchair-generated already those concepts by pure self-reflection and philosophizing. This is obvious in fields like physics. Experiments were necessary to even come up with the basic concepts of electromagnetism or relativity or quantum mechanics.
  I think the reason is that pure philosophizing is still more prestigious than getting down in the weeds and dirt and doing limited-scope well-defined experiments on concrete things. So people feel smart by wielding poorly defined concepts like "understanding" or "reasoning" or "thinking", contrasting it with "mere pattern matching", a bit like the stalemate that philosophy as a field often hits, as opposed to the more pragmatic approach in the sciences, where empirical contact with reality allows more consensus and clarity without getting caught up in mere semantics.
- jcranmer 7 days ago
  
  > The difficulty in coming up with a test for AGI is coming up with something that people will accept a passing grade as AGI.
  The difficulty with intelligence is we don't even know what it is in the first place (in a psychology sense, we don't even have a reliable model of anything that corresponds to what humans point at and call intelligence; IQ and g are really poor substitutes).
  Add into that Goodhart's Law (essentially, propose a test as a metric for something, and people will optimize for the test rather than what the test is trying to measure), and it's really no surprise that there's no test for AGI.

loki_ikol 7 days ago

Well for most, the next steps are probably towards removing the highly deterministic and discrete characteristics of current approaches (we certainly don't think in lock steps). Those have no measures. Even the creative aspect is undermined by those characteristics.

sva_ 7 days ago

It is a necessary condition, but not a sufficient one.

OtomotO 7 days ago

You're not alone in this, no.

My definition of AGI is the one I was brought up with, not an ever moving goal post (to the "easier" side).

And no, I also don't buy that we are just stochastic parrots.

But whatever. I've seen many hypes and if I don't die and the world doesn't go to shit, I'll see a few more in the next couple of decades

yunwal 6 days ago

The author of the test agrees with you.

mindcrime 7 days ago

> I feel like I'm the only one who isn't convinced getting a high score on the ARC eval test means we have AGI.

Wait, what? Approximately nobody is claiming that "getting a high score on the ARC eval test means we have AGI". It's a useful eval for measuring progress along the way, but I don't think anybody considers it the final word.

oldge 7 days ago

Today’s llms are fancy autocomplete but lack test time self learning or persistent drive. By contrast, an AGI would require: – A goal-generation mechanism (G) that can propose objectives without external prompts – A utility function (U) and policy π(a│s) enabling action selection and hierarchy formation over extended horizons – Stateful memory (M) + feedback integration to evaluate outcomes, revise plans, and execute real-world interventions autonomously Without G, U, π, and M operating llms remain reactive statistical predictors, not human level intelligence.

KoolKat23 7 days ago
I'd say we're not far off.
Looking at the human side, it takes a while to actually learn something. If you've recently read something it remains in your "context window". You need to dream about it, to think about, to revisit and repeat until you actually learn it and "update your internal model". We need a mechanism for continuous weight updating.
Goal-generation is pretty much covered by your body constantly drip-feeding your brain various hormones "ongoing input prompts".
- onemoresoop 7 days ago
  
  > I'd say we're not far off.
  How are we not far off? How can LLMs generate goals and based on what?
  
  10 replies →
- NetRunnerSu 7 days ago
  
  Yes, you're right, that's what we're doing.
  https://github.com/dmf-archive/PILF
  
  1 reply →
asah 7 days ago

we're closer than you think...
NetRunnerSu 7 days ago

In fact, there is no technical threshold anymore. As long as the theory is in place, you can see such AGI at most half a year. It will even be more energy efficient than the current dense models.
https://dmf-archive.github.io/docs/posts/beyond-snn-plausibl...

dfilppi 6 days ago

[dead]

NetRunnerSu 7 days ago

To pass Arc, you need a living model with sentient abilities, not the dead frog now.

https://news.ycombinator.com/item?id=44488126

aaron695 7 days ago

[dead]