← Back to context

Comment by skrebbel

2 years ago

The way you describe it, it doesn't seem much more complicated to me, from a “how does it work” perspective, just way bigger.

The overall structure is the same as in "use statistics to predict the next token."

With a Markov chain, the statistics are as simple as a mapping of n-grams to the number of times it appears in the corpus.

With a LLM, the statistics are the result of 50 years of research in neural network architectures, terabytes of training data, and many millions of dollars worth of hardware, along with the teams to build and manage all the data pipelines.

So yes, much more complicated.

  • You can have very complex calculations and a simple output. The complexity of the process to find the weights is not necessarily the same complexity than the process using them. The numbers don't get suddenly special because of how they were calculated (like 42 is just the number 42 even after 7 million years of calculations).

If the minimal representation of a model of the behavior is "way bigger", why are you disputing that it's more complicated? What's the difference?

  • The difference is the capabilities. LLMs don't necessarily need billions of Parameters. In fact, useful models (like the one used in never Apple devices' autocomplete) has only like 50 million. Markov chain... I guess there is probably a reason why we don't use them instead of neuronal networks. Maybe somebody more knowledgeable can enlighten us, but I suspect one might need magnitudes more parameters.

    • well the neuronal networks are able (when sized appropriately) to memorize the seen markov chains, and more, really.