Comment by bhouston
4 days ago
If you showed someone what our computers can do with the latest LLMs now to someone 5 years ago they would probably say it sure looks a lot like AGI.
We have to keep defining AGI upwards or nitpick it to show that we haven't achieved it.
I would argue that LLMs are actually smarter than the majority of humans right now. LLMs do not have quite the agency that humans have, but their intelligence is pretty decent.
We don't have clear ASI yet, but we definitely are in a AGI-era.
I think we are missing an ego/motiviations in the AGI and them having self-sufficiency independent of us, but that is just a bit of engineering that would actually make them more dangerous, it isn't really a significant scientific hurdle.
Ok, but it's not AGI. People five years ago would have been wrong. People who don't have all the information are often wrong about things.
ETA:
You updated your comment, which is fine but I wanted to reply to your points.
> I would argue that LLMs are actually smarter than the majority of humans right now. LLMs do not have quite the agency that humans have, but their intelligence is pretty decent.
I would actually argue that they are decidedly not smarter than even dumb humans right now. They're useful but they are glorified text predictors. Yes, they have more individual facts memorized than the average person but that's not the same thing; Wikipedia, even before LLMs also had many more facts than the average person but you wouldn't say that Wikipedia is "smarter" than a human because that doesn't make sense.
Intelligence isn't just about memorizing facts, it's about reasoning. The recent Esolang benchmarks indicate that these LLMs are actually pretty bad at that.
> We don't have clear ASI yet, but we definitely are in a AGI-era.
Nah, not really.
> They're useful but they are glorified text predictors.
There is a long history of people arguing that intelligence is actually the ability to predict accurately.
https://www.explainablestartup.com/2017/06/why-prediction-is...
> Intelligence isn't just about memorizing facts, it's about reasoning.
Initially, LLMs were basically intuitive predictors, but with chain of thought and more recently agentic experimentation, we do have reasoning in our LLMs that is quite human like.
That said, there is definitely a biased towards training set material, but that is also the case with the large majority of humans.
For the Esoland benchmarks, I would be curious how much adding a SKILLS.md file for each language would boost performance?
I am pretty confidence that we are in the AGI era. It is unsettling and I think it gives people cognitive dissonance so we want to deny it and nitpick it, etc.
> There is a long history of people arguing that intelligence is actually the ability to predict accurately.
That page describes a few recent CS people in AI arguing intelligence is being able to predict accurately which is like carpenters declaring all problems can be solved with a hammer.
AI "reasoning" is human-like in the sense that it is similar to how humans communicate reasoning, but that's not how humans mentally reason.
1 reply →
> There is a long history of people arguing that intelligence is actually the ability to predict accurately.
There sure is, and in psychological circles that it appears that there's an argument that that is not the case.
https://gwern.net/doc/psychology/linguistics/2024-fedorenko....
> Initially, LLMs were basically intuitive predictors, but with chain of thought and more recently agentic experimentation, we do have reasoning in our LLMs that is quite human like.
If you handwave the details away, then sure it's very human like, though the reasoning models just kind of feed the dialog back to itself to get something more accurate. I use Claude code like everyone else, and it will get stuck on the strangest details that humans actively wouldn't.
> For the Esoland benchmarks, I would be curious how much adding a SKILLS.md file for each language would boost performance?
Tough to say since I haven't done it, though I suspect it wouldn't help much, since there's still basically no training data for advanced programs in these languages.
> I am pretty confidence that we are in the AGI era. It is unsettling and I think it gives people cognitive dissonance so we want to deny it and nitpick it, etc.
Even if you're right about this being the AGI era, that doesn't mean that current models are AGI, at least not yet. It feels like you're actively trying to handwave away details.
1 reply →
What does AGI look like in your opinion?
Personally, I've used LLMs to debug hard-to-track code issues and AWS issues among other things.
Regardless of whether that was done via next-token prediction or not, it definitely looked like AGI, or at least very close to it.
Is it infallible? Not by a long shot. I always have to double-check everything, but at least it gave me solid starting points to figure out said issues.
It would've taken me probably weeks to find out without LLMd instead of the 1 or 2 hours it did.
In that context, I have a hard time thinking how would a "real" AGI system look like, that it's not the current one.
Not saying current LLMs are unequivocally AGI, but they are darn close for sure IMO.
> What does AGI look like in your opinion?
Being able to actually reason about things without exabytes of training data would be one thing. Hell, even with exabytes of training data, doing actual reasoning for novel things that aren't just regurgitating things from Github would be cool.
Being able to learn new things would be another. LLMs don't learn; they're a pretrained model (it's in the name of GPT), that send in inputs and get an output. RAGs are cool but they're not really "learning", they're just eating a bit more context in order to kind of give a facsimile of learning.
Going to the extreme of what you're saying, then `grep` would be "darn close to AGI". If I couldn't grep through logs, it might have taken me years to go through and find my errors or understand a problem.
I think that they're ultimately very neat, but ultimately pretty straightforward input-output functions.
1 reply →
If we had AGI we wouldn't need to keep spending more and more money to train these models, they could just solve arbitrary problems through logic and deduction like any human. Instead, the only way to make them good at something is to encode millions of examples into text or find some other technique to tune them automatically (e.g. verifiable reward modeling of with computer systems).
Why is it that LLMs could ace nearly every written test known to man, but need specialized training in order to do things like reliably type commands into a terminal or competently navigate a computer? A truly intelligent system should be able to 0-shot those types of tasks, or in the absolute worst case 1-shot them.
1 reply →
> The recent Esolang benchmarks indicate that these LLMs are actually pretty bad at that.
I’m really not sure how well a typical human would do writing brainfuck. It’d take me a long time to write some pretty basic things in a bunch of those languages and I’m a SE.
Yes, but you also wouldn't need a corpus of hundreds of thousands of projects to crib from. If it were truly able to "reason" then conceivably it could look at a language spec, and learn how to express things in term of Brainfuck.
1 reply →
My definition of AGI hasn't changed - it's something that can perform, or learn to perform, any intellectual task that a human can.
5 years ago we thought that language is the be-all and end-all of intelligence and treated it as the most impressive thing humans do. We were wrong. We now have these models that are very good at language, but still very bad at tasks that we wrongly considered prerequisites for language.
> My definition of AGI hasn't changed - it's something that can perform, or learn to perform, any intellectual task that a human can.
Wait, could you make your qualifiers specific here? Is your definition of AGI that it be able to perform/learn any intellectual task that is achievable by every human, or by any human?
Those are almost incomparably different standards. For the first, a nascent AGI would only need to perform a bit better than a "profound intellectual disability" level. For the second, AGI would need to be a real "Renaissance AGI," capable of advancing the frontiers of thought in every discipline, but at the same time every human would likely fail that bar.
Your true average human is someone like your barista at Starbuck's. Try giving them a good math problem, or logic puzzle, or leetcode problem if you need some reminding of the standard reasoning capabilities of our species. LLMs cannot beat the best humans at practically anything, but average humans? Average humans are a much softer target than this thread seems to think.
7 replies →
> If you showed someone what our computers can do with the latest LLMs now to someone 5 years ago they would probably say it sure looks a lot like AGI.
Would they? Perhaps if you only showed them glossy demos that obscure all the ways in which LLMs fail catastrophically and are very obviously nowhere even close to AGI.
Certainly, they wouldn't expect that an AI able to score 150 on an IQ test is unable to play a casual game of chess because it isn't coherent enough to play without making illegal moves.
> Certainly, they wouldn't expect that an AI able to score 150 on an IQ test is unable to play a casual game of chess because it isn't coherent enough to play without making illegal moves.
To be fair, I am pretty sure Claude Code will download and run stockfish, if you task it to play chess with you. It's not like a human who read 100 books about chess, but never played, would be able to play well with their eyes closed, and someone whispering board position into their ear
There are a lot of problems with this analogy, but even if you were to take a photo of the board after every move and send it to the model, it would still be unable to play competently.
It doesn't look anything like AGI and no one who knows what that means would be confused in any era.
Is it useful? Yes. Is it as smart as a person? Not even remotely. It can't even remember things it already was told 5 minutes ago. Sometimes even if they are still in the context window un compacted!
It doesn’t need to be human level, and if I walk into a room and forget why I went in am I no longer a general intelligence?
If it doesn't need to be human level then what are we even talking about? AGI means human level. Everything else is AI
2 replies →
No they aren't
ChatGPT Health failed hilariously bad at just spotting emergencies.
A few weeks ago most of them failed hilariously bad at the question if you should drive or walk to the service station if you want to wash your car
Idk about the health story, but in my use, ChatGPT has dramatically improved my understanding of my health issues and given sound and careful advice.
The second question sounds like a useless and artificial metric to judge on. The average person might miss such a “gotcha” logical quiz too, for the same reason - because they expect to be asked “is it walking distance.”
No one has ever relied on anyone else’s judgment, nor an AI, to answer “should I bring my car to the carwash.” Same for the ol’ “how many rocks shall I eat?” that people got the AI Overview tricked with.
I’m not saying anything categorically “is AGI” but by relying on jokes like this you’re lying to yourself about what’s relevant.
I have been checking organic and inorganic chemistry skills in ChatGPT pro and it is absolutely, laughably bad. But it sounds good, plausible but it comically wrong in so many ways.
Maybe you should think twice about whether the health issues advice it is giving you is legitimate.
It gave dangerous shitty advice to patients in critical conditions
https://www.bmj.com/content/392/bmj.s438
I would accuse you of nitpicking. My experience is that LLMs are generally as smart as the average human +90% of the time. A lack of perfect to me doesn't mean it isn't AGI.
>> My experience is that LLMs are generally as smart as the average human +90% of the time. A lack of perfect to me doesn't mean it isn't AGI.
In my experience, they contain more information than any human but they are actually quite stupid. Reasoning is not something they do well at all. But even if I skip that, they can not learn. Inference is separate from training, so they can not learn new things other than trying to work with words in a context window, and even then they will only be able to mimic rather than extrapolate anything new.
It's not the lack of perfect, it's the lack of reasoning and learning.
1 reply →
> If you showed someone what our computers can do with the latest LLMs now to someone 5 years ago they would probably say it sure looks a lot like AGI.
But this is a CPU! It's not a GPU / TPU. Even if you think we've achieved AGI, this is not where the matrix multiplication magic happens. It's pure marketing hype.
I did AI back before it was cool and I think we have agi. Imo the whole distinction was from extremely narrow AI to general intelligence. A classifier for engine failure can only do that - a route planner can only do that…
Now we have things I can ask a pretty arbitrary question and they can answer it. Translate, understand nuance (the multitude of ways of parsing sentences, getting sarcasm was an unsolved problem), write code, go and read and find answers elsewhere, use tools… these aren’t one trick ponies.
There are finer points to this where the level of autonomy or learning over time may be important parts to you but to me it was the generality that was the important part. And I think we’re clearly there.
Agi doesn’t have to be human level, and it doesn’t have to be equal to experts in every field all at once.
An interesting perspective: general, absolutely, just nowhere near superhuman in all kinds of tasks. Not even close to human in many. But intelligent? No doubt, far beyond all not entirely unrealistic expectations.
But that seems almost like an unavoidable trade-off. Fiction about the old "AI means logic!" type of AI is full of thought experiments where the logic imposes a limitation and those fictional challenges appear to be just what the AI we have excels at.
> LLMs are actually smarter than the majority of humans right now
I consider myself a bit of a misanthrope but this makes me an optimist by comparison.
Even stupid people are waaaaaay smarter than any LLM.
The problem is the continued habit humans have of anthropomorphizing computers that spit out pretty words. It’s like Eliza only prettier. More useful for sure. Still just a computer.
I really feel like we have not encountered the same stupid people. Most stupid people I know respond to every question with some form of will-not-attempt. What's 74 times 2? Use a calculator! Should I drive or walk to the car wash? Not my problem! How many R's in strawberry? Who cares! They'll lose to the LLM 100%.
The cheapest Aliexpress calculator can multiply much bigger numbers than I can in my head, and it can do it instantly. Does that mean that the calculator is “smarter” than me?
That's actually proving that they indeed are smarter than LLMs - by choosing to not deal and waste time, water and energy on useless benchmarks.
> Still just a computer.
I don't believe in a separation of mind and spirit. So I do think fundamentally, outside of a reliance on quantum effects in cognition (some of theorized but it isn't proven), its processes can be replicated in a fashion in computers. So I think that intelligence likely can be "just a computer" in theory and I think we are in the era where this is now true.
I don't believe in "spirits" from the get go. I think it's certainly theoretically possible that we could mimic human thought with a computer (quantum or otherwise) but I do not think that the LLMs we have now are doing that. I'd say that what we have right now is "just a computer".
This doesn't mean they aren't useful, I like Claude a lot, but I don't buy that it's AGI.
A human can think logically with reason, not to say they are smart or smarter. But LLMs cannot. You can convince a LLM anything is correct and it will believe you. You can't convince a human anything is correct.
I can't argue that LLMs do not know an absolute insane amount of information about everything. But you can't just say LLMs are smarter then most humans. We've already decided that smartness is not about how much data you know, but thinking about that data with logical reasoning. Including the fact it may or may not be true.
I can run a LLM through absolutely incorrect data, and tell it that data is 100% true. Then ask it questions about that data and get those incorrect results as answers. That's not easy to do with humans.
That just implies LLMs are suggestible. The same is true of children. As we get older and build a more complete world model in our heads, it's harder to get us to believe things which go against that model.
Tell a 5-yr old about Santa, and they will believe it sincerely. Do the same with a 30-year old immigrant who has never heard of Santa, and I suspect you'll have a harder time.
That's not because the 5-year old is dumber, but just because their life-experience ("training data") is much more limited.
Even so, trying to convince a modern LLM of something ridiculous is getting harder. I invite you to try telling ChatGPT or Gemini that the president died a week ago and was replaced by a body-double facsimile until January 2027, so that Vance can have a full term. I suspect you'll have significant difficulty.
> Do the same with a 30-year old immigrant who has never heard of Santa, and I suspect you'll have a harder time.
There's a plethora of people who convert to religion at an older age, and that seems far more far fetched than Santa.
2 replies →
The problem with definitions is that they are all wrong when you try to apply them outside mathematical models. Descriptive terms are more useful than normative ones when you are dealing with the real world. Their meaning naturally evolves when people understand the topic better.
General intelligence, as a description, covers many aspects of intelligence. I would say that the current AIs are almost but not quite generally intelligent. They still have severe deficiencies in learning and long-term memory. As a consequence, they tend to get worse rather than better with experience. To work around those deficiencies, people routinely discard the context and start over with a fresh instance.
AGI wouldn't lie to me every chance it got. Current LLMs are just slop generators, nothing more.
> I would argue that LLMs are actually smarter than the majority of humans right now
This (surprisingly common) view belies a wild misunderstanding of how LLMs work.
"look, it completely lied about params that don't exist in a CLI!"
AGI doesn't mean perfect. It means human like and the latest models are pretty human like in terms of their fallibility and capabilities.