Comment by geon
3 days ago
The LLM architectures we have now have reached their full potential already, so going further would require something completely different. It isn’t a matter of refining the existing tech, whereas the internet of 1997 is virtually technologically identical to what we have today. The real change has been sociological, not technological.
To make a car analogy; the current LLMs are not the early cars, but the most refined horse drawn carriages. No matter how much money is poured into them, you won’t find the future there.
The current generation of LLM's have convinced me that we already have the compute and the data needed for AGI, we just likely need a new architecture. But I really think such an architecture could be right around the corner. It appears to me like the building blocks are there for it, it would just take someone with the right luck and genius to make it happen.
> The current generation of LLM's have convinced me that we already have the compute and the data needed for AGI, we just likely need a new architecture.
I think this is one of the greatest fallacies surrounding LLMs. This one, and the other one - scaling compute!! The models are plenty fine, what they need is not better models, or more compute, they need better data, or better feedback to keep iterating until they reach the solution.
Take AlphaZero for example, it was a simple convolutional network, not great compared to LLMs, small relative recent models, and yet it beat the best of us at our own game. Why? Because it had unlimited environment access to play games against other variants of itself.
Same for the whole Alpha* family, AlphaStar, AlphaTensor, AlphaCode, AlphaGeometry and so on, trained with copious amounts of interactive feedback could reach top human level or surpass humans in specific domains.
What AI needs is feedback, environments, tools, real world interaction that exposes the limitations in the model and provides immediate help to overcome them. Not unlike human engineers and scientists - take their labs and experiments away and they can't discover shit.
It's also called the ideation-validation loop. AI can ideate, it needs validation from outside. That is why I insist the models are not the bottleneck.
For Alpha Zero, the "better data" was trivial. The environment of board games is extremely simplistic. It just can't be compared to language models.
The problem with language is that there is no know correct answer. Everything is vague, ambiguous and open ended. How would we even implement feedback for that?
So yes, we do need new models.
> The current generation of LLM's have convinced me that we already have the compute and the data needed for AGI, we just likely need a new architecture
This is likely true but not for the reasons you think about. This was arguably true 10 years ago too. A human brain uses 100 watts per day approx and unlike most models out there, the brain is ALWAYS in training mode. It has about 2 petabytes of storage.
In terms of raw capabilities, we have been there for a very long time.
The real challenge is finding the point where we can build something that is AGI level with the stuff we have. Because right now, we might have the compute and data needed for AGI but we might lack the tools needed to build a system that efficient. It's like a little dog trying to enter a fenced house, the closest path topologically between the dog and the house might not be accessible for that dog at that point because its current capabilities (short legs, inability to jump high or push through the fence standing in between) so while it is further topologically, a longer path topologically might be the closest path to reach the house.
In case it's not obvious, AGI is the house, we are the little dog and the fence represent current challenges to build AGI.
The notion that the brain uses less energy than an incandescent lightbulb and can store less data than YouTube does not mean we have had the compute and data needed to make AGI "for a very long time".
The human brain is not a 20-watt computer ("100 watts per day" is not right) that learns from scratch on 2 petabytes of data. State manipulations performed in the brain can be more efficient than what we do in silicon. More importantly, its internal workings are the result of billions of years of evolution, and continue to change over the course of our lives. The learning a human does over its lifetime is assisted greatly by the reality of the physical body and the ability to interact with the real world to the extent that our body allows. Even then, we do not learn from scratch. We go through a curriculum that has been refined over millennia, building on knowledge and skills that were cultivated by our ancestors.
An upper bound of compute needed to develop AGI that we can take from the human brain is not 20 watts and 2 petabytes of data, it is 4 billion years of evolution in a big and complex environment at molecular-level fidelity. Finding a tighter upper bound is left as an exercise for the reader.
3 replies →
Dial-up modems reached their full 56kbps potential in 1997, and going further required something completely different. It happened naturally to satisfy demand, and was done by many of the same companies and people; the change was technological, not sociological.
I think we're probably still far from the full potential of LLMs, but I don't see any obstacles to developing and switching to something better.
I don't think that comparison works very well at all.
We had plenty of options for better technologies both available and in planning, 56k modems were just the cost effective/lowest common denominator of their era.
It's not nearly as clear that we have some sort of proven, workable ideas for where to go beyond LLMs.
> Dial-up modems reached their full 56kbps potential in 1997
That's simply not true. Modems were basically the same tech in the dsl era, and using light instead of electricity is a very gradual refinement.
> we're probably still far from the full potential of LLMs
Then how come the returns are so extremely diminishing?
> I don't see any obstacles to developing and switching to something better.
The obstacle is that it needs to be invented. There was nothing stopping newton from discovering relativity either. We simply have no idea what the road forward even looks like.
> The LLM architectures we have now have reached their full potential already.
How do we know that?
what we can say right now is that we've hit the point of diminishing returns and the only way we're going to get signicantly more capable models is through a technological advance that we cannot forsee (and that may not come for decades if it ever comes)
Exactly. You're absolutely right to focus on that.
You could see some potential modifications. Already some are multimodal. You'd probably want something to change the weights as time goes on so they can learn. It might be more steam engines needing to be converted to petrol engines.