Comment by srean

2 months ago

I suggest getting familiar with or brushing up on the differences between a Markov Chain and a Markov Model. The former is a substantial restriction of the latter. The classic by Kemeny and Snell is a good readable reference.

MC have constant and finite context length, their state is the most recent k tuple of emitted alphabets and transition probabilities are invariant (to time and tokens emitted)

4 comments

srean

empiko 2 months ago

LLMs definitely also have finite context length. And if we consider padding, it is also constant. The k is huge compared to most Markov chains used historically, but it doesn't make it less finite.

srean 2 months ago
That's not correct. Even a toy like an exponential weighted moving averaging produces unbounded context (of diminishing influence).
- empiko 2 months ago
  
  What do you mean? I can only input k tokens into my LLM to calculate the probs. That is the definition of my state. In the exact way that N-gram LMs use N tokens, but instead of using ML models, they calculate the probabilities based on observed frequencies. There is no unbounded context anywhere.
  
  1 reply →