Comment by mr_wiglaf

5 days ago

The tricky thing is you get to define the state. So if the "state" is the current word _and_ the previous 10 it is still "memoryless". So an LLM's context window is the state. It doesn't matter whether _we_ see parts of the state as called history, the markov chain doesn't care (they are all just different features).

Edit: I could be missing important nuance that other people are pointing out in this thread!