Comment by roadside_picnic
4 days ago
> LLMs are Markov Chains on steroids?
It's not an unreasonable view, at least for decoder-only LLMs (which is what most popular LLMs are). While it may seem they violate the Markov property since they clearly do make use of their history, in practice that entire history is summarized in an embedding passed into the decoder. I.e.just like a Markov chain their entire history is compressed into a single point which leaves them conditionally independent of their past given their present state.
It's worth noting that this claim is NOT generally applicable to LLMs since both encoder/decoder and encoder-only LLMs do violate the Markov property and therefore cannot be properly considered Markov chains in a meaningful way.
But running inference on decoder only model is, at a high enough level of abstraction, is conceptually the same as running a Markov chain (on steroids).
No comments yet
Contribute on Hacker News ↗