Comment by srean
4 days ago
That's not correct. Even a toy like an exponential weighted moving averaging produces unbounded context (of diminishing influence).
4 days ago
That's not correct. Even a toy like an exponential weighted moving averaging produces unbounded context (of diminishing influence).
What do you mean? I can only input k tokens into my LLM to calculate the probs. That is the definition of my state. In the exact way that N-gram LMs use N tokens, but instead of using ML models, they calculate the probabilities based on observed frequencies. There is no unbounded context anywhere.
That's different.
You can certainly feed k-grams one at a time to, estimate the the probability distribution over next token and use that to simulate a Markov Chain and reinitialize the LLM (drop context). In this process the LLM is just a look up table to simulate your MC.
But an LLM on its own doesn't drop context to generate, it's transition probabilities change depending on the tokens.