Comment by YeGoblynQueenne

2 years ago

>> Thinking about it as "text data" is both your and Chomsky's problem -- the >petabytes of data aren't preprocessed into text. They're streams of sensory input. It's not zero shot if it's years of data of observing human behavior through all your senses.

I'm a little unsure what you mean. I think you mean that humans learn language not just from examples of language, but from examples of all kinds of concepts in our sensory input, not just language?

Well, that may or may not be the case for humans, but it's certainly not the case for machine learning systems. Machine learning systems must be trained with examples of a particular concept, in order to learn that concept and not another. For instance, language models must be trained with examples of language, otherwise they can't learn language.

There are multi-modal systems that are trained on multiple "modalities" but they can still not learn concepts for which they are not given specific examples. For instance, if a system is trained on examples of images, text and time series, it will learn a model of images, text and time series, but it won't be able to recognise, say, speech.

As to whether humans learn that way: who says we do? Is that just a conjecture proposed to support your other points, or is it something you really think is the case, and believe, based on some observations etc?

I think you’re missing the meat of my point. The stuff LLMs are trained on is in no way similar to what human brains have received. It’s a shortcut to train them directly on text tokens. Because that’s the data we have easily available. But it doesn’t mean the principles of machine learning (which are loosely derived from how the brain actually works) apply only to text data or narrow categories of data like you mentioned. It just might require significantly more and different input data and compute power to achieve more generally intelligent results.

What I believe personally is I don’t think there is any reason to rule out that the basics of neural networks could serve as the foundation of artificial general intelligence. I think a lot of the criticism of this sort of technology being too crude to do so is missing the forest for the trees.

I have a brain and it learns and I’ve watched many other people learn too and I see nothing there that seems fundamentally distinct from how machine learning behaves in very general terms. It’s perfectly plausible that my brain has just trained itself on all the sensory data of my entire life and is using that to probabilistically decide the next impulse to send to my body in the same way an LLM predicts the most appropriate next word.

  • >> But it doesn’t mean the principles of machine learning (which are loosely derived from how the brain actually works) apply only to text data or narrow categories of data like you mentioned.

    When you say "the principles of machine learning", I'd like to understand what you mean.

    If I were talking about "principles" of machine learning, I'd probably mean Leslie Valiant's Probably Approximately Correct Learning (PAC-Learning) setting [1] which is probably the most popular (because the most simple) theoretical framework of machine learning [2].

    Now, PAC-Learning theory is probably not what you mean when you say "principles of machine learning", nor is it any of the other theories of machine learning we have, that formalise the learnability of classes of concepts. That's clear because none of those theories are "derived from how the brain actually works", loosely or not.

    Mind you, there isn't any "principle", of machine learning, anyway, that I know of that is really "derived" from how the brain actually works; because we don't know how the brain actually works.

    So, based on all this, I believe what you mean by "principles of machine learning" is some intuition you have about how _neural networks_, work. Those were originally defined according to then-current understanding of how _neurons_ in the brain "work". That was back in 1943, by Pitts and McCulloch [3], what is known as the Perceptron. That model is not used any more and hasn't for many years.

    Still, if you are talking about neural networks, your intuition doesn't sound right to me. With neural nets, like with any other statistical learning approach, when we train on examples x of a class y, we learn the clas y. If we want to learn clases y', y", ... etc, we must train on examples x', x", ... and so on. You have to train neural nets on examples of what you want them to learn, otherwise, they won't learn, what you want them to learn.

    The same goes with all of machine learning, following from PAC-Learning: a learner is given labelled instances of a concept, drawn from a distribution over a class of concepts, as training examples. The learner can be said to learn the class, if it can correctly label unseen instances of the class with some probability of some degree of error, with respect to the true labelling.

    None of this says that you can train a nerual net on images and have it learn to generate text, or vice-versa, train it on text and have it recognise images. That is certainly not the way that any technology we have now works.

    Does the human brain work like that? Who knows? Nobody really knows how the brain works, let alone how it learns.

    So I don'tthink you're talking about any technology that we have right now, nor are you accurately extrapolating current technology to the future.

    If you are really curious about how all this stuff works, you should start by doing some serious reading: not blog posts and twitter, but scholarly articles. Start from the ones I linked, below. They are "ancient wisdom", but even researchers, today, are lost without them. The fact that most people don't have this knowledge (because, where would they find it?) is probably why there is so much misunderstanding on the internet of what is going on with LLMs and what they can develop to in the long term.

    Of course, if you don't really care and you just want to have a bit of fun on the web, well, then, carry on. Everyone's doing that, at the moment.

    ____________

    [1] https://web.mit.edu/6.435/www/Valiant84.pdf

    [2] There's also Vladimir Vapnik's statistical learning theory, Rademacher complexity, and older frameworks like Learning in the Limit etc.

    [3] https://www.cs.cmu.edu/~./epxing/Class/10715/reading/McCullo...