Comment by HarHarVeryFunny

1 month ago

You keep using the term "AGI" without defining what you mean by it, other than implicity defining it as "whatever can be achieved without changing the Transformer architecture", which makes your "claim" just a definitional tautology, which is fine, but it does mean you are talking about something different than what I am talking about, which is also fine.

> And none of the "brain-like" architectures are actually much better at "being a brain" than LLMs are

I've no idea what projects you are referring to.

It would certainly be bizarre if the Transformer architecture, never designed to be a brain, turns out to be the best brain we can come up with, and equal to real brains which have many more moving parts, each evolved over millions of years to fill a need and improve capability.

Maybe you are smarter than Demis Hassabis, and the DeepMind team, and all their work towards AGI (their version, not yours) will be a waste of effort. Why not send him a note "hey, dumbass, Transfomers are all you need!" ?

It would be certainly be bizarre if the 8086 architecture, never designed to be a foundation of all home, office and server computation, was the best CPU architecture ever made.

And it isn't. It's merely good enough.

That's what LLMs are. A "good enough" AI architecture.

By "AGI", I mean the good old "human equivalence" proxy. An AI that can accomplish any intellectual task that can be accomplished by a human. LLMs are probably sufficient for that. They have limitations, but not the kind that can't be worked around with things like sharply applied tool use. Which LLMs can be trained for, and are.

So far, all the weirdo architectures that try to replace transformers, or put brain-inspired features into transformers, have failed to live up to the promise. Which sure hints that the bottleneck isn't architectural at all.

  • I'm not aware of any architectures that have tried to put "brain-inspired" features into Transformers, or much attempt to modify them at all for that matter.

    The architectural Transformer tweaks that we've seen are:

    - Various versions of attention for greater efficiency

    - MOE vs dense for greater efficiency

    - Mamba (SSM) + transformer hybrid for greater efficiency

    None of these are even trying to fundamentally change what the Transformer is doing.

    Yeah, the x86 architecture is certainly a bit of a mess, but as you say good enough, as long as what you want to do is run good old fashioned symbolic computer programs. However, if you want to run these new-fangled neural nets, then you'd be better off with a GPU or TPU.

    > By "AGI", I mean the good old "human equivalence" proxy. An AI that can accomplish any intellectual task that can be accomplished by a human. LLMs are probably sufficient for that.

    I think DeepMind are right here, and you're wrong, but let's wait another year or two and see, eh?