Comment by bananapub
2 years ago
> can intuit the rules of the game from those examples,
I am pretty sure a bunch of matrix multiplications can't intuit anything.
naively, it doesn't seem very surprising that enormous amounts of self play cause the internal structure to reflect the inputs and outputs?
It's not self-play. It's literally just reading sequences of moves. And it doesn't even know that they're moves, or that it's supposed to be learning a game. It's just learning to predict the next token given a sequence of previous tokens.
What's kind of amazing is that, in doing so, it actually learns to play chess! That is, the model weights naturally organize into something resembling an understanding of chess, just by trying to minimize error on next-token prediction.
It makes sense, but it's still kind of astonishing that it actually works.
> I am pretty sure a bunch of matrix multiplications can't intuit anything.
I don't understand how people can say things like this when universal approximation is an easy thing to prove. You could reproduce Magnus Carlsen's exact chess-playing stochastic process with a bunch of matrix multiplications and nonlinear activations, up to arbitrarily small error.
I read such statements as being claims that "intuition" is part of consciousness etc.
It's still too strong a claim given that matrix multiplication also describes quantum mechanics and by extension chemistry and by extension biology and by extension our own brains… but I frequently encounter examples of mistaking two related concepts for synonyms, and I assume in this case it is meant to be a weaker claim about LLMs not being conscious.
Me, I think the word "intuition" is fine, just like I'd say that a tree falling in a forest with no one to hear it does produce a sound because sound is the vibration of the air instead of the qualia.
Funnily, for me intuition is the part of intelligence which I can more easily imagine as being done by a neural network. When my intuition says this person is not to trust I can easily imagine that being something like a simple hyperplane classification in situation space.
It's the active, iterative thinking and planning that is more critical for AGI and, while obviousky theoretically possible, much harder to imagine a neural network performing.
No, matrix multiplication is the system humans use to make predictions about those things but it doesn’t describe their fundamental structure and there’s no reason to imply they do.
1 reply →
This simply isn't true. There are big caveats to the idea that neural networks are universal function approximators (as there are to the idea that they're universal Turing machines, which also somehow became common knowledge in our post-ChatGPT world). The function has to be continuous, we're talking about functions rather than algorithms, an approximator being possible and us knowing how to construct it are very different things, and so on.
>The function has to be continuouss.
That's not a problem. You can show that neural network induced functions are dense in a bunch of function spaces, just like continuous functions. Regularity is not a critical concern anyways.
>functions vs algorithms
Repeatedly applying arbitrary functions to a memory (like in a transformer) yields you arbitrary dynamical systems, so we can do algorithms too.
> an approximator being possible and us knowing how to construct it are very different things,
This is of course the critical point, but not so relevant when asking whether something is theoretically possible. The way I see it this was the big question for deep learning and over the last decade the evidence has just continually grown that SGD is VERY good at finding weights that do in fact generalize quite well and that don't just approximate a function from step-functions the way you imagine an approximation theorem to construct it, but instead efficiently find features in the intermediate layers and use them for multiple purposes, etc. My intuition is that the gradient in high dimension doesn't just decrease the loss a bit in the way we imagine it for a low dimensional plot, but in those high dimensions really finds directions that are immensely efficient at decreasing loss. This is how transformers can become so extremely good at memorization.
We really need a list of verbs we're allowed to use when talking about computers and verbs that belong in the magic human/animal-only section
You are probably joking, but I think it's actually very important to look at the language we use around LLMs, in order not to get stuck in assumptions and sociological bias associated with a vocabulary usually reserved for "magical" beings, as it were.
This goes both ways by the way. I could be convinced that LLMs can achieve something the likes of intuition, but I strongly believe that it is a very different kind of intuition than we normally associate with humans/animals. Usins the same label is thus potentially confusing, and (human pride aside) might even prevent us from appreciating the full scope of what LLMs are capable of.
I think the issue is that we're suddenly trying to pin down something that was previously fine being loosely understood, but without any new information.
If someone came to the table with "intuition is the process of a system inferring a likely outcome from given inputs by the process X - not to be confused with matmultuition which is process Y", that might be a reasonable proposal.
> naively, it doesn't seem very surprising that enormous amounts of self play cause the internal structure to reflect the inputs and outputs?
Right. Wait, are you talking about AI or humans?
Can a bunch of neurons firing based on chemical and electrical triggers intuit anything? It has to be the case that any intelligent process must be the emergent result of non-intelligent processes, because intelligence is not an inherent property of anything.
What does „intuit“ mean to you then?