Comment by otabdeveloper4

2 months ago

> it’s more complicated than that.

No it isn't.

> ...fool you into thinking you understand what is going on in that trillion parameter neural network.

It's just matrix multiplication and logistic regression, nothing more.

8 comments

otabdeveloper4

LLMs are a general purpose computing paradigm. LLMs are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. Training LLMs discovers programs that well reproduce the input sequence. Roughly the same architecture can generate passable images, music, or even video.

The sequence of matrix multiplications are the high level constraint on the space of programs discoverable. But the specific parameters discovered are what determines the specifics of information flow through the network and hence what program is defined. The complexity of the trained network is emergent, meaning the internal complexity far surpasses that of the course-grained description of the high level matmul sequences. LLMs are not just matmuls and logits.

[1] https://x.com/karpathy/status/1582807367988654081

otabdeveloper4 2 months ago
> LLMs are a general purpose computing paradigm.
Yes, so is logistic regression.
- hackinthebochs 2 months ago
  
  No, not at all.
  
  4 replies →

libraryofbabel 1 month ago

You really think I didn't already know how LLMs are put together when I wrote my comment? I've implemented these things from scratch in PyTorch. Of course I know the building blocks.

And if you want to get pedantic and technical, you didn't even get the reductionism right! Modern LLMs don't use the logistic regression sigmoid function for network activation nonlinearity anymore, they use things like ReLU or GELU. You're about 15 years behind.

Reductionism is counterproductive in biology ("human brains are voltage spikes across membranes, nothing more") and it's counterproductive here as well. LLMs have nontrivial emergent behavior. The interesting questions are all around what that behavior is and how it arises in the network during training, and if you refuse to engage beyond bare reductionism you won't even be able to ask those questions, let alone answer them.