← Back to context

Comment by CamperBob2

2 years ago

It's not just window size. It's the difference between syntax and semantics.

A Markov model, by definition, works only with literal token histories. It can't participate meaningfully in a conversation unless the user happens to employ token sequences that the model has seen before (ideally multiple times.) An LLM can explain why it's not just a Markov model, but the converse isn't true.

Now, if you were to add high-dimensional latent-space embedding to a Markov model, that would make the comparison more meaningful, and would allow tractable computation with model sizes that were completely impossible to deal with before. But then it wouldn't be a Markov model anymore. Or, rather, it would still be a Markov model, but one that's based on relationships between tokens rather than just their positions in a linear list.

Another analogy might be to say that a Markov model can implement lossless compression only, while a latent-space model can implement lossy compression. There's a school of thought that says that lossy compression doesn't just require intelligence, it is intelligence, and LLMs can be seen as an example of that equivalence. Not saying I agree with that school, or that you should, but as someone else pointed out, comparing Markov chains with LLMs are at best like comparing goldfish brains with human brains.

I like: Intelligence is compressing information into irreducible representation.

Which leads to a wonderful tongue-in-cheek contraindication: representation types such as a particular model, when it’s complexity increases, especially via edge cases, it is then a result of agentic anti-intelligence.

That is to say, anything that increases in complexity without being refactored is a sign of a lack of intelligence or worse.

And any sense of information that is impossible to be reduced further while maintaining equal or more expressibility are signs of maximum agentic intelligence.

>high-dimensional latent-space embedding to a Markov model

That's what we call a hidden Markov model.

>There's a school of thought that says that lossy compression doesn't just require intelligence, it is intelligence, and LLMs can be seen as an example of that equivalence.

SVD is used to implement lossy compression as does JPEG encoding... these algorithms are in no way intelligent.

  • They’re doing highly specific tasks where the intelligence can come from the designer of the algorithm.

    In particular, JPEG has intelligence encoded about how graphics are displayed, what detail we won’t notice is missing, and what artifacts we won’t notice are present, much like the psychoacoustic models behind lossy music compression schemes like MP3.

    But we had to feed an encoder that by way of algorithmic design. It’s hardcoded intelligence, like any other function, but with a lot more outside knowledge required to do it right than a sort or swap.

    I’d call an LLM a more general problem solver. It can write cogent limericks, convincingly screw up math, summarize papers it’s never seen before, generate book plots or character arcs based on specific requests, translate to a language you just made up and explained in the prompt, etc.

    The intelligent bits are emergent and can do something reasonable with novel input, even if it doesn’t closely resemble the exact material it was trained on.

    The comparison would be a process that could lossy-compress any kind of sensory media possible with no perceptible loss, based solely on its training on human capabilities and how the reproduction devices work—i.e. it could create the JPEG algorithm, not just perform it.

  • SVD is used to implement lossy compression as does JPEG encoding... these algorithms are in no way intelligent.

    You'll have to take that up with people above my pay grade. It's not that simple, apparently. Call me when a Markov model can explain why it's equivalent to an LLM.