Comment by ryanSrich
11 days ago
AGI is here. 90%+ of white collar work _can_ be done by an LLM. We are simply missing a tested orchestration layer. Speaking broadly about knowledge work here, there is almost nothing that a human is better at than Opus 4.6. Especially if you're a typical office worker whose job is done primarily on a computer, if that's all AGI is, then yeah, it's here.
Opus is the very best and I still throw away most of what it produces. If I did not carefully vet its work I would degrade my code bases so quickly. To accurately measure the value of AI you must include the negative in your sum.
I would and have done the same with Jr. devs. It's not an argument against it being AGI.
I'm countering the basis of your original claim; "there is almost nothing that a human is better at than Opus 4.6". This is simply not true.
That "simple orchestration layer" (paraphrased) is what I consider the AGI.
But yeah, I suspect LLM:s may actually get close enough. "Just" add more reasoning loops and corresponding compute.
It is objectively grotesquely wasteful (a human brain operates on 12 to 25 watts and would vastly outperform something like that), but it would still be cataclysmic.
/layperson, in case that wasn't obvious
If we can get AI down to this power requirement then it's over for humans. Just think of how many copies of itself thinking at the levels of the smartest humans it could run at once. Also where all the hardware could hide itself and keep itself powered around the world.
> a human brain operates on 12 to 25 watts
Yeah, but a human brain without the human attached to it is pretty useless. In the US, it averages out to around 2 kW per person for residential energy usage, or 9 kW if you include transportation and other primary energy usage too.
Fair.
Maybe the Matrix (1999) with the human battery farms were on to something. :)
1 reply →
I think "tested" is the hard part. The simple part seems to be there already, loops, crons, and computer use is getting pretty close.
I ran a quick experiment with Claude and Perplexity, both free versions. I input some retirement info (portfolios balances etc), my age, my desired retirement age etc. Simple stuff that a financial planner would have no issue with. Perplexity was very very good on the surface. Rarely made an obvious blunder or error, and was fast. Claude was much slower and despite me inputting my exact birthdate, kept messing up my age by as much as 18 months. This obviously screws up retirement planning. I also asked some questions about how RMDs would affect my taxes, and asked for some strategies. Perplexity was convinced that I should do a Roth conversion to max up to the 22% bracket, while Claude thought that the tax savings would be minimal.
Mind you, I used the EXACT same prompts. I don't know which model Perplexity was using since the free version has multiple it chooses from (including Claude 3.0).
AGI is when it can do all intellectual work that can be done by humans. It can improve its own intelligence and create a feedback loop because it is as smart as the humans who created it.
No, that is ASI. No human can do all intellectual work themselves. You have millions of different human models based on roughly the same architecture to do that.
When you have a single model that can do all you require, you are looking at something that can run billions of copies of itself and cause an intelligence explosion or an apocalypse.
"Artificial general intelligence (AGI) is a type of artificial intelligence that matches or surpasses human capabilities across virtually all cognitive tasks."
1 reply →
This has always been my personal definition of AGI. But the market and industry doesn't agree. So I've backed off on that and have more or less settled on "can do most of the knowledge work that a human can do"
Why the super-high bar? What's unsatisfying is that aren't the 'dumbest' humans still a general intelligence that we're nearly past, depending how you squint and measure?
It feels like an arbitrary bar to perhaps make sure we aren't putting AIs over humans, which they are most certainly in the superhuman category on a rapidly growing number of tasks.
API Opus 4.6 will tell you it's still 2025, admit it's wrong then revert back to being convinced it's 2025 as it nears it's context limit.
I'll go so far as to say LLM agents are AGI-lite but saying we "just need the orchestration layer" is like saying ok we have a couple neurons, now we just need the rest of the human.
Giving opus a memory or real-time access to the current year is trivial. I don't see how that's an argument against it being AGI.
> there is almost nothing that a human is better at than Opus 4.6.
Lolwut. I keep having to correct Claude at trivial code organization tasks. The code it writes is correct; it’s just ham-fisted and violates DRY in unholy ways.
And I’m not even a great coder…
I’m very pro AI coding and use it all day long, but I also wouldn’t say “the code it writes is correct”. It will produce all kinds of bugs, vulnerabilities, performance problems, memory leaks, etc unless carefully guided.
So it's even more human than we thought
This is entirely solvable with skills, memory, context, and further prompting. All of which can be done in a way that's reliable and repeatable.
You wouldn't expect a Jr. dev to be the best at keeping things dry either.
> there is almost nothing that a human is better at than Opus 4.6.
> You wouldn't expect a Jr. dev to be the best at keeping things dry either.
So a junior dev is better than almost all humans at everything?
Yea the “you’re holding it wrong” argument. Never takes long to pop up.
> You wouldn't expect a Jr. dev to be the best at keeping things dry either.
Did you read the comment I replied to? The premise was
> there is almost nothing that a human is better at than Opus 4.6.
So which is it? Is Claude the junior dev “better at” most things than a human or not? Sorry, you can’t play your argument both ways.
> violates DRY in unholy ways
Well said
Can llms manipulate spread sheets?
Yes. https://x.com/claudeai/status/2014834616889475508?s=46