Comment by crazy5sheep
17 hours ago
The whole point is that LLMs, especially the attention mechanism in transformers, have already paved the road to AGI. The main gap is the training data and its quality. Humans have generations of distilled knowledge — books, language, culture passed down over centuries. And on top of that we have the physical world — we watched birds fly, saw apples drop, touched hot things. Maybe we should train the base model with physical world data first, and then fine tune with the distilled knowledge.
Human life includes a lot of adversarial training (lying relatives) and training in temporal logics, which would seem to be a somewhat different domain than purely linguistic computations (e.g. staying up late, feeling bad; working hard at a task for months, getting better at it; feeling physical skills, even editing Go with emacs, move from the conscious layer into the cerebrellar layer). I think attention is a poor mans "OODA" loop; cognitive science is learning that a primary function of the brain is predicting what will be going on with the body in the immediate future, and prepping for it; that's not a thing that LLMs are architecturally positioned to do. Maybe swarms of agents (although in my mind that's more of a way to deal with LLM poor performance with large context of instructions (as opposed to large context of data) than a way to have contending systems fighting to make a decision for the overall entity), but they still lack both the real-time computational aspect and the continuously tricky problem of other people telling partially correct information.
There's plenty of training data, for a human. The LLM architecture is not as efficient as the brain; perhaps we can overcome that with enough twitter posts from PhDs, and enough YouTubes of people answering "why" to their four year olds and college lectures, but that's kind of an experimental question.
Starting a network out in a contrained body and have it learn how to control that, with a social context of parents and siblings would be an interesting experiment, especially if you could give it an inherent temporality and a good similar-content-addressable persistent memory. Perhaps a bit terrifying experiment, but I guess the protocols for this would be air-gapped, not internet connected with a credit card.