Comment by thaumasiotes
11 days ago
>> The fact that they only generate sequences that existed in the source
> I am quite confused right now. Could you please help me with this?
This is pretty straightforward. Sohcahtoa82 doesn't know what he's saying.
I'm fully open to being corrected. Just telling me I'm wrong without elaborating does absolutely nothing to foster understanding and learning.
If you still think there's something left to explain, I recommend you read your other responses. Being restricted to the training data is not a property of Markov output. You'd have to be very, very badly confused to think that it was. (And it should be noted that a Markov chain itself doesn't contain any training data, as is also true of an LLM.)
More generally, since an LLM is a Markov chain, it doesn't make sense to try to answer the question "what's the difference between an LLM and a Markov chain?" Here, the question is "what's the difference between a tiny LLM and a Markov chain?", and assuming "tiny" refers to window size, and the Markov chain has a similarly tiny window size, they are the same thing.
An LLM is not a Markov chain of the input tokens, because it has internal computational state (the KV cache and residuals).
An LLM is a Markov process if you include its entire state, but that's a pretty degenerate definition.
2 replies →
He said LLMs are creative, yet people have been telling me that LLMs cannot solve problems that is not in their training data. I want this to be clarified or elaborated on.
9 replies →
1) being restricted to exact matches in input is definition of Markov Chains
2) LLMs are not Markov Chains
13 replies →