← Back to context

Comment by Insanity

4 days ago

Whenever I see claims about AGI being reachable through large language models, it reminds me of the miasma theory of disease. Many respectable medical professionals were convinced this was true, and they viewed the entire world through this lens. They interpreted data in ways that aligned with a miasmatic view.

Of course now we know this was delusional and it seems almost funny in retrospect. I feel the same way when I hear that 'just scale language models' suddenly created something that's true AGI, indistinguishable from human intelligence.

Is AGI about replicating human intelligence though? Like, human intelligence comes with its own defects, and whatever we fantasize about an intelligence free of this or that defects is done from a perspective that is full of defectiveness.

Collectively at global level, it seems we are just unable to avoid escalating conflicts into things as horrible as genocides. Individuals which have remarkable ability to achieve technical feats sometime can in the same time fall behind the most basic expectation in term of empathy which can also be considered a form off intelligence.

> Whenever I see claims about AGI being reachable through large language models, it reminds me of the miasma theory of disease.

Whenever I see people think the model architecture matters much, I think they have a magical view of AI. Progress comes from high quality data, the models are good as they are now. Of course you can still improve the models, but you get much more upside from data, or even better - from interactive environments. The path to AGI is not based on pure thinking, it's based on scaling interaction.

To remain in the same miasma theory of disease analogy, if you think architecture is the key, then look at how humans dealt with pandemics... Black Death in the 14th century killed half of Europe, and none could think of the germ theory of disease. Think about it - it was as desperate a situation as it gets, and none had the simple spark to keep hygiene.

The fact is we are also not smart from the brain alone, we are smart from our experience. Interaction and environment are the scaffolds of intelligence, not the model. For example 1B users do more for an AI company than a better model, they act like human in the loop curators of LLM work.

  • If I'm understanding you, it seems like you're struck by hindsight bias. No one knew the miasma theory was wrong... it could have been right! Only with hindsight can we say it was wrong. Seems like we're in the same situation with LLMs and AGI.

    • The miasma theory of disease was "not even wrong" in the sense that it was formulated before we even had the modern scientific method to define the criteria for a theory in the first place. And it was sort of accidentally correct in that some non-infectious diseases are caused by airborne toxins.

      2 replies →

    • > Only with hindsight can we say it was wrong

      It really depends what you mean by 'we'. Laymen? Maybe. But people said it was wrong at the time with perfectly good reasoning. It might not have been accessible to the average person, but that's hardly to say that only hindsight could reveal the correct answer.

  • It's unintuitive to me that architecture doesn't matter - deep learning models, for all their impressive capabilities, are still deficient compared to human learners as far as generalisation, online learning, representational simplicity and data efficiency are concerned.

    Just because RNNs and Transformers both work with enormous datasets doesn't mean that architecture/algorithm is irrelevant, it just suggests that they share underlying primitives. But those primitives may not be the right ones for 'AGI'.

  • > Of course you can still improve the models, but you get much more upside from data, or even better - from interactive environments.

    I'm on the contrary believe that the hunt for better data is an attempt to climb the local hill and be stuck there without reaching the global maximum. Interactive environments are good, they can help, but it is just one of possible ways to learn about causality. Is it the best way? I don't think so, it is the easier way: just throw money at the problem and eventually you'll get something that you'll claim to be the goal you chased all this time. And yes, it will have something in it you will be able to call "causal inference" in your marketing.

    But current models are notoriously difficult to teach. They eat enormous amount of training data, a human needs much less. They eat enormous amount of energy to train, a human needs much less. It means that the very approach is deficient. It should be possible to do the same with the tiny fraction of data and money.

    > The fact is we are also not smart from the brain alone, we are smart from our experience. Interaction and environment are the scaffolds of intelligence, not the model.

    Well, I learned English almost all the way to B2 by reading books. I was too lazy to use a dictionary most of the time, so it was not interactive: I didn't interact even with dictionary, I was just reading books. How many books I've read to get to B2? ~10 or so. Well, I read a lot of English in Internet too, and watched some movies. But lets multiply 10 books by 10. Strictly speaking it was not B2, I was almost completely unable to produce English and my pronunciation was not just bad, it was worse. Even now I stumble sometimes on words I cannot pronounce. Like I know the words and I mentally constructed a sentence with it, but I cannot say it, because I don't know how. So to pass B2 I spent some time practicing speech, listening and writing. And learning some stupid topic like "travel" to have a vocabulary to talk about them in length.

    How many books does LLM need to consume to get to B2 in a language unknown to it? How many audio records it needs to consume? Life wouldn't be enough for me to read and/or listen so much.

    If there was a human who needed to consume as much information as LLM to learn, they would be the stupidest person in all the history of the humanity.

    • >With only instructional materials (a 500-page reference grammar, a dictionary, and ≈400 extra parallel sentences) all provided in context, Gemini 1.5 Pro and Gemini 1.5 Flash are capable of learning to translate from English to Kalamang— a Papuan language with fewer than 200 speakers and therefore almost no online presence—with quality similar to a person who learned from the same materials

      https://arxiv.org/abs/2403.05530

      1 reply →

    • Are you asking how many books a large language model would need to read to learn a new language if it was only trained on a different language? probably just 1 (the dictionary)

      3 replies →

  • If model arch doesn't matter much how come transformers changed everything?

    • Luck. RNNs can do it just as good, Mamba, S4, etc - for a given budget of compute and data. The larger the model the less architecture makes a difference. It will learn in any of the 10,000 variations that have been tried, and come about 10-15% close to the best. What you need is a data loop, or a data source of exceptional quality and size, data has more leverage. Architecture games reflect more on efficiency, some method can be 10x more efficient than another.

      5 replies →

The miasma theory of disease, though wrong, made lots of predictions that proved useful and productive. Swamps smell bad, so drain them; malaria decreases. Excrement in the street smells bad, so build sewage systems; cholera decreases. Florence Nightingale implemented sanitary improvements in hospitals inspired by miasma theory that improved outcomes.

It was empirical and, though ultimately wrong, useful. Apply as you will to theories of learning.