Comment by kromem

2 years ago

It would be more accurate to say that a Markov chain is an example of a method that would perform relatively well at the same training task as a LLM.

So too, might a human trying to predict the next tokens.

But a human and a Markov chain are not the same underlying process to achieve next token prediction, and neither is the same underlying process as a LLM.

8 comments

kromem

Jensson 2 years ago

LLM are markov chains, a markov chain is a general concept and not just a text model technique. You must be thinking about the very simple markov chain models we had before where you just predicted the next word by looking up sentences with the same preceding words and picking a random of those words, that is also a markov chain just like LLM but a much simpler one, you are right LLMs aren't like that but they are still markov chains with the same kind of inputs and outputs as the old ones.

kromem 2 years ago
No, the self-attention for the transformer in GPT means it isn't a Markov Chain.
A blog post of you want to read more:
https://medium.com/@andrew_johnson_4/are-transformers-markov...
- acjohnson55 2 years ago
  
  Isn't it though, if you consider the entire context to be part of the state? It seems like his argument is based on an assumption of the Markov model only using the current word as its state.
  
  4 replies →
yobbo 2 years ago

The term "Markov chain" is used only for sequences where the "current" state depends only on the previous state.
https://en.wikipedia.org/wiki/Markov_property