Comment by ACCount37
1 month ago
It would be certainly be bizarre if the 8086 architecture, never designed to be a foundation of all home, office and server computation, was the best CPU architecture ever made.
And it isn't. It's merely good enough.
That's what LLMs are. A "good enough" AI architecture.
By "AGI", I mean the good old "human equivalence" proxy. An AI that can accomplish any intellectual task that can be accomplished by a human. LLMs are probably sufficient for that. They have limitations, but not the kind that can't be worked around with things like sharply applied tool use. Which LLMs can be trained for, and are.
So far, all the weirdo architectures that try to replace transformers, or put brain-inspired features into transformers, have failed to live up to the promise. Which sure hints that the bottleneck isn't architectural at all.
I'm not aware of any architectures that have tried to put "brain-inspired" features into Transformers, or much attempt to modify them at all for that matter.
The architectural Transformer tweaks that we've seen are:
- Various versions of attention for greater efficiency
- MOE vs dense for greater efficiency
- Mamba (SSM) + transformer hybrid for greater efficiency
None of these are even trying to fundamentally change what the Transformer is doing.
Yeah, the x86 architecture is certainly a bit of a mess, but as you say good enough, as long as what you want to do is run good old fashioned symbolic computer programs. However, if you want to run these new-fangled neural nets, then you'd be better off with a GPU or TPU.
> By "AGI", I mean the good old "human equivalence" proxy. An AI that can accomplish any intellectual task that can be accomplished by a human. LLMs are probably sufficient for that.
I think DeepMind are right here, and you're wrong, but let's wait another year or two and see, eh?