Comment by visarga
8 days ago
The story is entertaining, but it has a big fallacy - progress is not a function of compute or model size alone. This kind of mistake is almost magical thinking. What matters most is the training set.
During the GPT-3 era there was plenty of organic text to scale into, and compute seemed to be the bottleneck. But we quickly exhausted it, and now we try other ideas - synthetic reasoning chains, or just plain synthetic text for example. But you can't do that fully in silico.
What is necessary in order to create new and valuable text is exploration and validation. LLMs can ideate very well, so we are covered on that side. But we can only automate validation in math and code, but not in other fields.
Real world validation thus becomes the bottleneck for progress. The world is jealously guarding its secrets and we need to spend exponentially more effort to pry them away, because the low hanging fruit has been picked long ago.
If I am right, it has implications on the speed of progress. Exponential friction of validation is opposing exponential scaling of compute. The story also says an AI could be created in secret, which is against the validation principle - we validate faster together, nobody can secretly outvalidate humanity. It's like blockchain, we depend on everyone else.
Did we read the same article?
They clearly mention, take into account and extrapolate this; LLM have first scaled via data, now it's test time compute, but recent developments (R1) clearly show this is not exhausted yet (i.e. RL on synthetically (in-silico) generated CoT) which implies scaling with compute. The authors then outline further potential (research) developments that could continue this dynamic, literally things that have already been discovered just not yet incorporated into edge models.
Real-world data confirms their thesis - there have been a lot of sceptics about AI scaling, somewhat justified ("whoom" a.k.a. fast take-off hasn't happened - yet) but their fundamental thesis has been wrong - "real-world data has been exhausted, next algorithmic breakthroughs will be hard and unpredictable". The reality is, while data has been exhausted, incremental research efforts have resulted in better and better models (o1, r1, o3, and now Gemini 2.5 which is a huge jump! [1]). This is similar to how Moore's Law works - it's not given that CPUs get better exponentially, it still requires effort, maybe with diminishing returns, but nevertheless the law works...
If we ever get to models be able to usefully contribute to research, either on the implementation side, or on research ideas side (which they CANNOT yet, at least Gemini 2.5 Pro (public SOTA), unless my prompting is REALLY bad), it's about to get super-exponential.
Edit: then once you get to actual general intelligence (let alone super-intelligence) the real-world impact will quickly follow.
Well based on what I'm reading, the OP's intent is that, not all (hence 'fully') validation, if not most of, can be done in-silico. I think we all agree that and that's the major bottleneck making agents useful - you have to have human-in-the-loop to closely guardrail the whole process.
Of course you can get a lot of mileage via synthetically generated CoT but does that lead to LLM speed up developing LLM is a big IF.
No, the entire point of this article is that when you get to self-improving AI, it will become generally intelligent, then you can use that to solve robotics, medicine etc. (like a generally-intelligent baby can (eventually) solve how to move boxes, assemble cars, do experiments in labs etc. - nothing special about a human baby, it's just generally intelligent).
2 replies →
Yeah I think the math+code reasoning models, like o1 and r1, are doing what can be done with just pure compute without real world validation. But the real world is complex, we can't simulate it. Why do we make particle accelerators, fusion reactor prototypes, space telescopes, year long vaccine trials? It's because we need to validate ideas in the real world that cannot be done theoretically or computationally.
Best reply in this entire thread, and I align with your thinking entirely. I also absolutely hate this idea amongst tech-oriented communities that because an AI can do some algebra and program an 8-bit video game quickly and without any mistakes, it's already overtaking humanity. Extrapolating from that idea to some future version of these models, they may be capable of solving grad school level physics problems and programming entire AAA video games, but again - that's not what _humanity_ is about. There is so much more to being human than fucking programming and science (and I'm saying this as an actual nuclear physicist). And so, just like you said, the AI arm's race is about getting it good at _known_ science/engineering, fields in which 'correctness' is very easy to validate. But most of human interaction exists in a grey zone.
Thanks for this.
> that's not what _humanity_ is about
I've not spent too long thinking on the following, so I'm prepared for someone to say I'm totally wrong, but:
I feel like the services economy can be broadly broken down into: pleasure, progress and chores. Pleasure being poetry/literature, movies, hospitality, etc; progress being the examples you gave like science/engineering, mathematics; and chore being things humans need to coordinate or satisfy an obligation (accountants, lawyers, salesmen).
In this case, if we assume AI can deal with things not in the grey zone, then it can deal with 'progress' and many 'chores', which are massive chunks of human output. There's not much grey zone to them. (Well, there is, but there are many correct solutions; equivalent pieces of code that are acceptable, multiple versions of a tax return, each claiming different deductions, that would fly by the IRS, etc)
I have considered this too. I frame it as problem solving. We are solving problems across all fields, from investing, to designing, construction, sales, entertainment, science, medicine, repair. What do you need when you are solving problems? You need to know the best action you can take in a situation. How is AI going to know all that? Some things are only tacicly known by key people, some things are guarded secrets (how do you make cutting edge chips, or innovative drugs?), some rely on experience that is not written down. Many of those problems have not even been fully explored, they are open field of trial and error.
AI progress depends not just on ideation speed, but on validation speed. And validation in some fields needs to pass through the physical world, which makes it expensive, slow, and rate limited. Hence I don't think AI can reach singularity. That would only be possible if validation was as easy to scale as ideation.
I'm not sure where construction and physical work goes into your categories. Process and chores maybe. But I think AI will struggle in the physical domain - validation is difficult and repeated experiments to train on are either too risky, too costly or potentially too damaging (i.e. in the real world failure is often not an option unlike software where test benches can allow controlled failure in a simulated env).
1 reply →
programming entire AAA video games
Even this is questionable, cause we're seeing it making forms and solving leetcodes, but no llm yet created a new approach, reduced existing unnecessary complexity (which we created mountains of), made something truly new in general. All they seem to do is rehash of millions of "mainstream" works, and AAA isn't mainstream. Cranking up the parameter count or the time of beating around the bush (aka cot) doesn't magically substitute for lack of a knowledge graph with thick enough edges, so creating a next-gen AAA video game is far out of scope of llm's abilities. They are stuck in 2020 office jobs and weekend open source tech, programming-wise.
"stuck" is a bit strong of a term. 6 months ago I remember preferring to write even Python code myself because Copilot would get most things wrong. My most successful usage of Copilot was getting it to write CRUD and tests. These days, I can give Claude Sonnet in Cursor's agent mode a high-level Rust programming task (e.g. write a certain macro that would allow a user to define X) and it'll modify across my codebase, and generally the thing just works.
At current rate of progress, I really do think in another 6 months they'll be pretty good at tackling technical debt and overcomplication, at least in codebases that have good unit/integration test coverage or are written in very strongly typed languages with a type-friendly structure. (Of course, those usually aren't the codebases needing significant refactoring, but I think AIs are decent at writing unit tests against existing code too.)
"They are stuck in 2020 office jobs and weekend open source tech, programming-wise."
You say that like it's nothing special! Honestly I'm still in awe at the ability of modern LLMs to do any kind of programming. It's weird how something that would have been science fiction 5 years ago is now normalised.
1 reply →
OK but getting good at science/engineering is what matters because that's what gives AI and people who wield it power. Once AI is able to build chips and datacenters autonomously, that's when singularity starts. AI doesn't need to understand humans or act human-like to do those things.
I think what they mean is that the fundamental question is IF any intelligence can really break out of its confined area of expertise and control a substantial amount of the world just by excelling in highly verifiable domains. Because a lot of what humans need to do is decisions based on expertise and judgement that in systems follows no transparent rules.
I guess it’s the age old question if we really know what we are doing („experience“) or we just tumble through life and it works out because the overall system of humans interacting with each other is big enough. The current state of world politics makes be think it’s the latter.
I don't necessarily think you're wrong, and in general I do agree with you to an extent that this seems like self-centeted Computer Scientist/SWE hubris to think that automating programming is ~AGI.
HOWEVER there is a case to be made that software is an insanely powerful lever for many industries, especially AI. And if current AI gets good enough at software problems that it can improve its own infrastructure or even ideate new model architectures, then we would (in this hypothetical case), potentially reach an "intelligence explosion," which would (may) _actually_ yield a true, generalized intelligence.
So as a cynic, while I think the intermediary goal of many of these so-called-agi companies is just your usual SaaS automation slop because thats the easiest industry to disrupt and extract money from (and the people at these companies only really know how software works, as opposed to having knowledge of other things like chemistry, biology, etc), I also think that in theory, being a very fast and low cost programming agent is a bit more powerful than you think.
I agree with your point about the validation bottleneck becoming dominant over raw compute and simple model scaling. However, I wonder if we're underestimating the potential headroom for sheer efficiency breakthroughs at our levels of intelligence.
Von Neumann for example was incredibly brilliant, yet his brain presumably ran on roughly the same power budget as anyone else's. I mean, did he have to eat mountains of food to fuel those thoughts? ;)
So it looks like massive gains in intelligence or capability might not require proportionally massive increases in fundamental inputs at least at the highest levels of intelligence a human can reach, and if that's true for the human brain why not for other architecture of intelligence.
P.S. It's funny, I was talking about something along the lines of what you said with a friend just a few minutes before reading your comment so when I saw it I felt that I had to comment :)
I think you are underestimating the context, we all stand on shoulders of giants. Let's think what would happen if kid Einstein, at the young age of 5, was marooned on an island and recovered 30 years later. Will he have any deep insights to dazzle us with? I don't think he would.
Hayy ibn Yaqdhan Nature vs nurture and relative nature of intelligence iirc
This is what I think as well. Unfortunately for the AI proponents they already made an example of the software industry. Its on news reports in the US and globally; most people are no longer recommending to get into the industry, etc. Software for better or worse has made an example for other industries as to what "not to do" both w.r.t data (online and option), and culture (e.g. open source, open tests, etc).
Anecdotally most people I know are against AI - they see more negatives from it than positives. Reading things like this just reinforces that belief.
The question of why are we even doing this? Why did we invent this? etc. Most people aren't interested in creating a "worthy successor" at best that eliminates them and potentially their children seeing that goal as nothing but naive and dare I say it wrong. All these thoughts will come from reading the above for most people.
History unfolds without anyone at the helm. It just happens, like a pachinko ball falling down the board. Global economic structures will push the development of AI and they're extremely hard to overwhelm.
for better or worse, decisions with great impact are taken by people in power. this view of history as a pachinko ball may numb us to not question the people in power.
Many tasks are amenable to simulation training and synthetic data. Math proofs, virtual game environments, programming.
And we haven't run out of all data. High-quality text data may be exhausted, but we have many many life-years worth of video. Being able to predict visual imagery means building a physical world model. Combine this passive observation with active experimentation in simulated and real environments and you get millions of hours of navigating and steering a causal world. Deepmind has been hooking up their models to real robots to let them actively explore and generate interesting training data for a long time. There's more to DL than LLMs.
This is true, a lot of progress can still happen based on simulation and synthetic data. But I am considering the long term game. In the long term we can't substitute simulation to reality. We can't even predict if a 3-body system will eventually eject an object, or if a piece of code will halt for all possible inputs. Physical systems implementing Turing machines are undecidable. Even fluid flows. The core problem is that recursive processes create an knowledge gap, and we can't cross that gap unless we walk the full recursion, there is no way to predict the outcome from outside. The real world is such an undecidable recursive process. AI can still make progress, but not at exponentially speed decoupled from the real world and not in isolation.