Comment by golol
2 years ago
> I am pretty sure a bunch of matrix multiplications can't intuit anything.
I don't understand how people can say things like this when universal approximation is an easy thing to prove. You could reproduce Magnus Carlsen's exact chess-playing stochastic process with a bunch of matrix multiplications and nonlinear activations, up to arbitrarily small error.
I read such statements as being claims that "intuition" is part of consciousness etc.
It's still too strong a claim given that matrix multiplication also describes quantum mechanics and by extension chemistry and by extension biology and by extension our own brains… but I frequently encounter examples of mistaking two related concepts for synonyms, and I assume in this case it is meant to be a weaker claim about LLMs not being conscious.
Me, I think the word "intuition" is fine, just like I'd say that a tree falling in a forest with no one to hear it does produce a sound because sound is the vibration of the air instead of the qualia.
Funnily, for me intuition is the part of intelligence which I can more easily imagine as being done by a neural network. When my intuition says this person is not to trust I can easily imagine that being something like a simple hyperplane classification in situation space.
It's the active, iterative thinking and planning that is more critical for AGI and, while obviousky theoretically possible, much harder to imagine a neural network performing.
No, matrix multiplication is the system humans use to make predictions about those things but it doesn’t describe their fundamental structure and there’s no reason to imply they do.
> describe their fundamental structure
That is literally, literally, what it does.
One may argue that it does so wrongly, but that's a different claim entirely.
> there’s no reason to imply they do
The predictions matching reality to the best of our collective abilities to test them is such a reason.
The saying that "all models are wrong but some are useful" is a reason against that.
This simply isn't true. There are big caveats to the idea that neural networks are universal function approximators (as there are to the idea that they're universal Turing machines, which also somehow became common knowledge in our post-ChatGPT world). The function has to be continuous, we're talking about functions rather than algorithms, an approximator being possible and us knowing how to construct it are very different things, and so on.
>The function has to be continuouss.
That's not a problem. You can show that neural network induced functions are dense in a bunch of function spaces, just like continuous functions. Regularity is not a critical concern anyways.
>functions vs algorithms
Repeatedly applying arbitrary functions to a memory (like in a transformer) yields you arbitrary dynamical systems, so we can do algorithms too.
> an approximator being possible and us knowing how to construct it are very different things,
This is of course the critical point, but not so relevant when asking whether something is theoretically possible. The way I see it this was the big question for deep learning and over the last decade the evidence has just continually grown that SGD is VERY good at finding weights that do in fact generalize quite well and that don't just approximate a function from step-functions the way you imagine an approximation theorem to construct it, but instead efficiently find features in the intermediate layers and use them for multiple purposes, etc. My intuition is that the gradient in high dimension doesn't just decrease the loss a bit in the way we imagine it for a low dimensional plot, but in those high dimensions really finds directions that are immensely efficient at decreasing loss. This is how transformers can become so extremely good at memorization.