← Back to context

Comment by rwmj

2 years ago

Number of parameters is not the difference. A Markov chain can easily be a multi-dimensional matrix with millions of entries. The significant difference is that a length 3 Markov chain can only ever find connections between 3 adjacent symbols (words, usually). LLMs seem to be able to find and connect abstract concepts at a very long and variable distances in the input.

Nevertheless I agree with the premise of the posting. I used Markov chains recently to teach someone what a statistical model of language is, followed by explaining to them the perceptron, and then (hand waving a bit) explaining how many large, deep layers scales everything up massively.

Seemingly? Is there not a direct technical reason to compare?

  • The people I was instructing are not very technical, I had to hand-wave a lot. (Nevertheless I think they got a much better overview of the tech than they would have got by reading some pop-sci description.)