Comment by rwmj

2 years ago

Number of parameters is not the difference. A Markov chain can easily be a multi-dimensional matrix with millions of entries. The significant difference is that a length 3 Markov chain can only ever find connections between 3 adjacent symbols (words, usually). LLMs seem to be able to find and connect abstract concepts at a very long and variable distances in the input.

Nevertheless I agree with the premise of the posting. I used Markov chains recently to teach someone what a statistical model of language is, followed by explaining to them the perceptron, and then (hand waving a bit) explaining how many large, deep layers scales everything up massively.

3 comments

rwmj

pyinstallwoes 2 years ago

Seemingly? Is there not a direct technical reason to compare?

rwmj 2 years ago

The people I was instructing are not very technical, I had to hand-wave a lot. (Nevertheless I think they got a much better overview of the tech than they would have got by reading some pop-sci description.)
aoeusnth1 2 years ago

Reproduction context length is a standard benchmark.