Right now these models are basically good for automation, not innovation. Things like Karpathy's "auto research" where you use the model to automate your hyperparamter sweeps etc. The researcher/engineer decides what experiments they want to run, and builds an LLM harness to automate it, and the bottleneck remains the compute to run these experiments at scale.
Moving beyond LLMs to AGI, not just better LLMs, is going to require architectural and algorithic changes. Maybe an LLM can help suggest directions, but even then it's up to a researcher to take those on board and design and automate experiments to see if any of the ideas pan out.
Companies are already doing this, but they are never going to stop releasing/selling models since that is the product, and the revenue from each generation of model is what helps keep the ship afloat and pay for salaries and compute to develop the next generation.
The endgame isn't "AGI, then world domination" - it's just trying to build a business around selling ever-better models, and praying that the revenue each generation of model generates can keep up with the cost to build it.
Isn't the U.S. government at least completely asleep at the wheel or captured by the very same "random" companies? I realize the administration got all pissy with Anthropic but it sounds like the gov and gov contractors are still using their models.
Yeah but they still (at least to public knowledge) do not posses anything that could be called AGI. But as these capabilities increase they'll probably get an offer they can't refuse sooner or later.
What can a SOTA LLM not answer that the average person can? It's already more intelligent than any polymath that ever existed, it just lacks motivation and agency.
No I think that’s accurate. They seem more like an oracle to me. Or as someone put it here, it’s a vectorization of (most/all?) human knowledge, which we can replay back in various permutations.
LLMs and human intelligence overlap, but they are not the same. What LLMs show is that we don't need AGI to be impressed. For example, LLMs are not good playing games such as Go [1].
I don't see why not, especially with computer use and vision capabilities. Are you talking about their lack of physical embodiment? AGI is about cognitive ability, not physical. Think of someone like Stephen Hawking, an example of having extraordinary general intelligence despite severe physical limitations.
It isnt that weird. Just look at the gemini-cli repo. Its a gong show. The issue is that LLMs can be wrong sometimes sure but more that all the existing SDL were never meant to iterate this quickly.
If the system (code base in this case) is changing rapidly it increases the probability that any given change will interact poorly with any other given change. No single person in those code bases can have a working understanding of them because they change so quickly. Thus when someone LGTM the PR was the LLM generated they likely do not have a great understanding of the impact it is going to have.
Plausibly now. "As we wrote in the Project Glasswing announcement, we do not plan to make Mythos Preview generally available"
I remember when they didn't plan to give LLMs internet access for the same safety reasons.
Fictional timeline that holds up pretty well so far: https://ai-2027.com/
Welp, that was a scary read.
"So far" is two entries: "AI companies build bigger datacenters" and "AI is being used for AI research with modest success".
Right now these models are basically good for automation, not innovation. Things like Karpathy's "auto research" where you use the model to automate your hyperparamter sweeps etc. The researcher/engineer decides what experiments they want to run, and builds an LLM harness to automate it, and the bottleneck remains the compute to run these experiments at scale.
Moving beyond LLMs to AGI, not just better LLMs, is going to require architectural and algorithic changes. Maybe an LLM can help suggest directions, but even then it's up to a researcher to take those on board and design and automate experiments to see if any of the ideas pan out.
Companies are already doing this, but they are never going to stop releasing/selling models since that is the product, and the revenue from each generation of model is what helps keep the ship afloat and pay for salaries and compute to develop the next generation.
The endgame isn't "AGI, then world domination" - it's just trying to build a business around selling ever-better models, and praying that the revenue each generation of model generates can keep up with the cost to build it.
I think it is naive to think the government (US or China most probably) will just let some random company control something so powerful and dangerous.
I think it is naive to think that artificial super intelligence will be controlled by anyone.
If it is smarter than all humans combined at everything why would any humans collectively control the ai?
All the ants in your backyard still make no decisions vs you
You'd probably listen to those ants if they put you in a harness and had a little ant-sized remote control that could just, you know, turn you off.
1 reply →
Isn't the U.S. government at least completely asleep at the wheel or captured by the very same "random" companies? I realize the administration got all pissy with Anthropic but it sounds like the gov and gov contractors are still using their models.
Yeah but they still (at least to public knowledge) do not posses anything that could be called AGI. But as these capabilities increase they'll probably get an offer they can't refuse sooner or later.
When the benchmarks actually mean something
Can LLMs be AGI at all?
What can a SOTA LLM not answer that the average person can? It's already more intelligent than any polymath that ever existed, it just lacks motivation and agency.
And has ADHD, but yeah, I'm fairly convinced that AGI is already here.
My understanding is no. But the definition of AGI isn’t that well defined and has been evolving, making the assessment pretty much impossible
Can an LLM program real AGI faster than a human?
Good question. I would guess no - but it could help you build one. Am I mistaken?
They could help you build an AGI if someone else has already built AGI and published it on GitHub.
3 replies →
No I think that’s accurate. They seem more like an oracle to me. Or as someone put it here, it’s a vectorization of (most/all?) human knowledge, which we can replay back in various permutations.
LLMs and human intelligence overlap, but they are not the same. What LLMs show is that we don't need AGI to be impressed. For example, LLMs are not good playing games such as Go [1].
[1] https://arxiv.org/abs/2601.16447
I don't see why not, especially with computer use and vision capabilities. Are you talking about their lack of physical embodiment? AGI is about cognitive ability, not physical. Think of someone like Stephen Hawking, an example of having extraordinary general intelligence despite severe physical limitations.
I would assume somewhere in both the companies there's a Ralph loop running with the prompt "Make AGI".
Kinda makes me think of the Infinite Improbability Drive.
Weird how Claude Code itself is still so buggy though (though I get they don't necessarily care)
It isnt that weird. Just look at the gemini-cli repo. Its a gong show. The issue is that LLMs can be wrong sometimes sure but more that all the existing SDL were never meant to iterate this quickly.
If the system (code base in this case) is changing rapidly it increases the probability that any given change will interact poorly with any other given change. No single person in those code bases can have a working understanding of them because they change so quickly. Thus when someone LGTM the PR was the LLM generated they likely do not have a great understanding of the impact it is going to have.
Probably right now because they're keeping it for themselves?
They already do, but not the way you said, the always have an internal model that is better and use themselves, they release based on competition.
why_not_both.gif
It will arrive in the same DLC as flying cars.
Now, I guess. They aren't releasing this one generally. I assume they are using it internally.
I mean, guess why Anthropic is pulling ahead...? One can have one's cake and eat it too.