Comment by ducktective
4 days ago
LLMs are Markov Chains on steroids?
By the way, does anyone know which model or type of model was used in winning gold in IMO?
4 days ago
LLMs are Markov Chains on steroids?
By the way, does anyone know which model or type of model was used in winning gold in IMO?
> LLMs are Markov Chains on steroids?
It's not an unreasonable view, at least for decoder-only LLMs (which is what most popular LLMs are). While it may seem they violate the Markov property since they clearly do make use of their history, in practice that entire history is summarized in an embedding passed into the decoder. I.e.just like a Markov chain their entire history is compressed into a single point which leaves them conditionally independent of their past given their present state.
It's worth noting that this claim is NOT generally applicable to LLMs since both encoder/decoder and encoder-only LLMs do violate the Markov property and therefore cannot be properly considered Markov chains in a meaningful way.
But running inference on decoder only model is, at a high enough level of abstraction, is conceptually the same as running a Markov chain (on steroids).
A Markov process is any process where if you have perfect information on the current state, you cannot gain more information about the next state by looking at any previous state.
Physics models of closed systems moving under classical mechanics are deterministic, continuous Markov processes. Random walks on a graph are non deterministic, discrete Markov processes.
You may further generalize that if a process has state X, and the prior N states contribute to predicting the next state, you can make a new process whose state is an N-vector of Xs, and the graph connecting those states reduces the evolution of the system to a random walk on a graph, and thus a Markov process.
Thus any system where the best possible model of its evolution requires you to examine at most finitely many consecutive states immediately preceding the current state is a Markov process.
For example, an LLM that will process a finite context window of tokens and then emit a weighted random token is most definitely a Markov process.
> LLMs are Markov Chains on steroids?
Might be a reference to this[1] blog post which was posted here[2] a year ago.
There has also been some academic work linking the two, like this[3] paper.
[1]: https://arxiv.org/abs/2410.02724