Comment by lukev
2 years ago
The overall structure is the same as in "use statistics to predict the next token."
With a Markov chain, the statistics are as simple as a mapping of n-grams to the number of times it appears in the corpus.
With a LLM, the statistics are the result of 50 years of research in neural network architectures, terabytes of training data, and many millions of dollars worth of hardware, along with the teams to build and manage all the data pipelines.
So yes, much more complicated.
You can have very complex calculations and a simple output. The complexity of the process to find the weights is not necessarily the same complexity than the process using them. The numbers don't get suddenly special because of how they were calculated (like 42 is just the number 42 even after 7 million years of calculations).