Comment by wizzwizz4

2 months ago

A GPT model would be modelled as an n-gram Markov model where n is the size of the context window. This is slightly useful for getting some crude bounds on the behaviour of GPT models in general, but is not a very efficient way to store a GPT model.

2 comments

wizzwizz4

chpatrick 2 months ago

I'm not saying it's an n-gram Markov model or that you should store them as a lookup table. Markov models are just a mathematical concept that don't say anything about storage, just that the state change probabilities are a pure function of the current state.

srean 2 months ago

You say state can be anything, no restrictions at all. Let me sell you a perfect predictor then :) The state is the next token.