← Back to context

Comment by _Algernon_

3 days ago

LLMs aren't Markov chains unless they have a context window of 1.

>In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event

The tricky thing is you get to define the state. So if the "state" is the current word _and_ the previous 10 it is still "memoryless". So an LLM's context window is the state. It doesn't matter whether _we_ see parts of the state as called history, the markov chain doesn't care (they are all just different features).

Edit: I could be missing important nuance that other people are pointing out in this thread!