Comment by jjtheblunt

2 years ago

Genuine question: what do you mean by many orders of magnitude more complex?

14 comments

jjtheblunt

I like your question, and I cannot answer it. But I have a benchmark: I can write a Markov chain "language model" in around 10-20 lines of Python, with zero external libraries -- with tokenization and "training" on a text file, and generating novel output. I wrote it in several minutes and didn't bother to save it.

I'm curious how much time & code it would take to implement this LLM stuff at a similar level of quality and performance.

soraki_soladead 2 years ago

FLOPs by perplexity by samples is an interesting way to compare this family of models.
fennecbutt 2 years ago

Generally LLM architectures are pretty low code, I thought (not written one myself).
Then all of the complexity comes with the training/weight data.

thfuran 2 years ago

A typical demonstration markov chain probably has a length of around 3. A typical recent LLM probably has more than three billion parameters. That's not precisely apppes to apples, but the LLM is certainly vastly more complicated.

rwmj 2 years ago
Number of parameters is not the difference. A Markov chain can easily be a multi-dimensional matrix with millions of entries. The significant difference is that a length 3 Markov chain can only ever find connections between 3 adjacent symbols (words, usually). LLMs seem to be able to find and connect abstract concepts at a very long and variable distances in the input.
Nevertheless I agree with the premise of the posting. I used Markov chains recently to teach someone what a statistical model of language is, followed by explaining to them the perceptron, and then (hand waving a bit) explaining how many large, deep layers scales everything up massively.
- pyinstallwoes 2 years ago
  
  Seemingly? Is there not a direct technical reason to compare?
  
  2 replies →
skrebbel 2 years ago
The way you describe it, it doesn't seem much more complicated to me, from a “how does it work” perspective, just way bigger.
- lukev 2 years ago
  
  The overall structure is the same as in "use statistics to predict the next token."
  With a Markov chain, the statistics are as simple as a mapping of n-grams to the number of times it appears in the corpus.
  With a LLM, the statistics are the result of 50 years of research in neural network architectures, terabytes of training data, and many millions of dollars worth of hardware, along with the teams to build and manage all the data pipelines.
  So yes, much more complicated.
  
  1 reply →
- thfuran 2 years ago
  
  If the minimal representation of a model of the behavior is "way bigger", why are you disputing that it's more complicated? What's the difference?
  
  2 replies →