Comment by thwarted
2 years ago
A LLM is a Markov chain with billions of associations and weights. A Makov chain is an LLM of maybe a few dozen associations and weights (so an LM, without the first L).
The difference is in the data structure and the size of the atoms/n-grams. The data structure Markov chain implementations use is not efficient for billions of parameters, either in storage or in processing. But the idea is the same: give a likely next token given the last n tokens. The value for n is a narrow window for Markov chains and an extremely wide window for LLM. LLM are able to maintain massive amounts of state compared to a Markov chain implementation.
“A human brain is just like a dog’s brain, only with more neural pathways.” True, perhaps, but largely pointless: at some point neural complexity results in a difference of kind, not of degree.
I’d argue the same is true of LLMs vs simpler models like Markov chains.
But a human brain is also just like an elephant or whale's brain except with 1/5th of the neural pathways at best, probably less.
There's a qualitative difference too.
I think this vastly underplays animals intelligence though. There is so much focus on creating human level intelligence, but where is a robot that can learn to navigate the world like a dog or cat can?
It was an analogy about how stupid it is to say a markov chain is “just like an LLM.”
Idea behind complexity can be very simple, but at scale work yield in very different results.
To compare Markov Chain with an LLM is kind of like to compare a single cell organism to a human being because we both are based on cells.
As already mentioned, Markov chains were already used in practice as (poor) Turing tests (IMHO working best in art projects, and sadly, spam).
Sure, today's LLMs blow them out of the water, but the difference was much less striking with neural networks even as late as 2010.
That’s interesting you put the date at 2010! When I learned machine learning in ~2016-2018, markov chains were THE fundamental problem tool of NLP still. Neural networks were all the rage ofc, but still… could you tell me what tech/change you’re thinking of?
For example, I remember /r/SubredditSimulator blowing my mind, and I’m pretty sure that was markov chains
1 reply →
It's not just window size. It's the difference between syntax and semantics.
A Markov model, by definition, works only with literal token histories. It can't participate meaningfully in a conversation unless the user happens to employ token sequences that the model has seen before (ideally multiple times.) An LLM can explain why it's not just a Markov model, but the converse isn't true.
Now, if you were to add high-dimensional latent-space embedding to a Markov model, that would make the comparison more meaningful, and would allow tractable computation with model sizes that were completely impossible to deal with before. But then it wouldn't be a Markov model anymore. Or, rather, it would still be a Markov model, but one that's based on relationships between tokens rather than just their positions in a linear list.
Another analogy might be to say that a Markov model can implement lossless compression only, while a latent-space model can implement lossy compression. There's a school of thought that says that lossy compression doesn't just require intelligence, it is intelligence, and LLMs can be seen as an example of that equivalence. Not saying I agree with that school, or that you should, but as someone else pointed out, comparing Markov chains with LLMs are at best like comparing goldfish brains with human brains.
I like: Intelligence is compressing information into irreducible representation.
Which leads to a wonderful tongue-in-cheek contraindication: representation types such as a particular model, when it’s complexity increases, especially via edge cases, it is then a result of agentic anti-intelligence.
That is to say, anything that increases in complexity without being refactored is a sign of a lack of intelligence or worse.
And any sense of information that is impossible to be reduced further while maintaining equal or more expressibility are signs of maximum agentic intelligence.
>high-dimensional latent-space embedding to a Markov model
That's what we call a hidden Markov model.
>There's a school of thought that says that lossy compression doesn't just require intelligence, it is intelligence, and LLMs can be seen as an example of that equivalence.
SVD is used to implement lossy compression as does JPEG encoding... these algorithms are in no way intelligent.
They’re doing highly specific tasks where the intelligence can come from the designer of the algorithm.
In particular, JPEG has intelligence encoded about how graphics are displayed, what detail we won’t notice is missing, and what artifacts we won’t notice are present, much like the psychoacoustic models behind lossy music compression schemes like MP3.
But we had to feed an encoder that by way of algorithmic design. It’s hardcoded intelligence, like any other function, but with a lot more outside knowledge required to do it right than a sort or swap.
I’d call an LLM a more general problem solver. It can write cogent limericks, convincingly screw up math, summarize papers it’s never seen before, generate book plots or character arcs based on specific requests, translate to a language you just made up and explained in the prompt, etc.
The intelligent bits are emergent and can do something reasonable with novel input, even if it doesn’t closely resemble the exact material it was trained on.
The comparison would be a process that could lossy-compress any kind of sensory media possible with no perceptible loss, based solely on its training on human capabilities and how the reproduction devices work—i.e. it could create the JPEG algorithm, not just perform it.
SVD is used to implement lossy compression as does JPEG encoding... these algorithms are in no way intelligent.
You'll have to take that up with people above my pay grade. It's not that simple, apparently. Call me when a Markov model can explain why it's equivalent to an LLM.
An LLM is a Markov chain in the same sense that a cat is a tiger, technically true but it misses the qualia.
It's not. There's fundamental architectural differences that couldn't be bigger.
A better comparison would be that it's like a windup toy versus a group of humans moving an entire civilization. They both move along a distance, but just listing the systems that the human group has that the windup toy doesn't is too long to fit on a page.
> It's not. There's fundamental architectural differences that couldn't be bigger.
LLM architecture is a markov chain to the core. It isn't a lookup table like old markov chains but it is still a markov chain: next word prediction based on previous words.
2 replies →