← Back to context

Comment by GuB-42

2 years ago

The big difference is the "deep" in "deep learning".

As the article says, Markov chains can be thought of as linear operations, which are very limited, and the reason early neural networks weren't taken seriously. Notoriously, you couldn't implement a XOR operation using these early neural nets.

It all changed when we got to multi-layer perceptrons, with backpropagation. The key here is the transfer function, which is non-linear, which allows the network to do infinitely more what you can achieve with simple linear combinations. And it is also differentiable, which allows for learning using backpropagation. This is the theoretical foundation behind LLMs, diffusion models, classifiers, etc... Essentially everything we call "AI" today.

So, yes, by being non-linear, LLMs are deeply smarter than Markov chains (pun intended).