← Back to context

Comment by HarHarVeryFunny

7 hours ago

That depends on how you define AGI - it's a meaningless term to use since everyone uses it to mean different things. What exactly do you mean ?!

Yes, there is a lot that can be improved via different training, but at what point is it no longer a language model (i.e. something that auto-regressively predicts language continuations)?

I like to use an analogy to the children's "Stone Soup" story whereby a "stone soup" (starting off as a stone in a pot of boiling water) gets transformed into a tasty soup/stew by strangers incrementally adding extra ingredients to "improve the flavor" - first a carrot, then a bit of beef, etc. At what point do you accept that the resulting tasty soup is not in fact stone soup?! It's like taking an auto-regressively SGD-trained Transformer, and incrementally tweaking the architecture, training algorithm, training objective, etc, etc. At some point it becomes a bit perverse to choose to still call it a language model

Some of the "it's just training" changes that would be needed to make today's LLMs more brain-like may be things like changing the training objective completely from auto-regressive to predicting external events (with the goal of having it be able to learn the outcomes of it's own actions, in order to be able to plan them), which to be useful would require the "LLM" to then be autonomous and act in some (real/virtual) world in order to learn.

Another "it's just training" change would be to replace pre/mid/post-training with continual/incremental runtime learning to again make the model more brain-like and able to learn from it's own autonomous exploration of behavior/action and environment. This is a far more profound, and ambitious, change than just fudging incremental knowledge acquisition for some semblance of "on the job" learning (which is what the AI companies are currently working on).

If you put these two "it's just training/learning" enhancements together then you've now got something much more animal/human-like, and much more capable than an LLM, but it's already far from a language model - something that passively predicts next word every time you push the "generate next word" button. This would now be an autonomous agent, learning how to act and control/exploit the world around it. The whole pre-trained, same-for-everyone, model running in the cloud, would then be radically different - every model instance is then more like an individual learning based on it's own experience, and maybe you're now paying for compute for the continual learning compute rather than just "LLM tokens generated".

These are "just" training (and deployment!) changes, but to more closely approach human capability (but again, what to you mean by "AGI"?) there would also need to be architectural changes and additions to the "Transformer" architecture (add looping, internal memory, etc), depending on exactly how close you want to get to human/animal capability.