Comment by Jensson
2 years ago
LLM are markov chains, a markov chain is a general concept and not just a text model technique. You must be thinking about the very simple markov chain models we had before where you just predicted the next word by looking up sentences with the same preceding words and picking a random of those words, that is also a markov chain just like LLM but a much simpler one, you are right LLMs aren't like that but they are still markov chains with the same kind of inputs and outputs as the old ones.
No, the self-attention for the transformer in GPT means it isn't a Markov Chain.
A blog post of you want to read more:
https://medium.com/@andrew_johnson_4/are-transformers-markov...
Isn't it though, if you consider the entire context to be part of the state? It seems like his argument is based on an assumption of the Markov model only using the current word as its state.
It's not the only one stating this:
https://safjan.com/understanding-differences-gpt-transformer...
If we're broadening the scope of a Markov chain to consider the entire system as the current state that's being used to determine the next step of operation, then isn't literally every computer program a Markov chain under that definition?
You can't just include the memory of previous states which the current state is depending on as being part of the "current state" to fit the definition of a Markov chain without having broadened the scope to the point the definition becomes meaningless.
3 replies →
The term "Markov chain" is used only for sequences where the "current" state depends only on the previous state.
https://en.wikipedia.org/wiki/Markov_property