Comment by atum47

5 days ago

I usually have this technical hypothetical discussions with ChatGpt, I can share if you like, me asking him this: aren't LLMs just huge Markov Chains?! And now I see your project... Funny

> I can share if you like

Respectfully, absolutely nobody wants to read a copy-and-paste of a chat session with ChatGPT.

  • When you say nobody you mean you, right? You can't possible be answering for every single person in the world.

    I was having a discussion about similarities between Markov Chains and LLMs and short after I found this topic on HN, when I wrote "I can share if you like" was as a proof about the coincidence.

LLMs are indeed Markov chains. The breakthrough is that we are able to efficiently compute well performing probabilities for many states using ML.

  • LLMs are not Markov Chains unless you contort the meaning of a Markov Model State so much you could even include the human brain.

    • Not sure why that's contorting, a markov model is anything where you know the probability of going from state A to state B. The state can be anything. When it's text generation the state is previous text to text with an extra character, which is true for both LLMs and oldschool n-gram markov models.

      21 replies →

  • Markov models with more than 3 words as "context window" produce very unoriginal text in my experience (corpus used had almost 200k sentences, almost 3 million words), matching the OP's experience. These are by no means large corpuses, but I know it isn't going away with a larger corpus.[1] The Markov chain will wander into "valleys" of reproducing paragraphs of its corpus one for one because it will stumble upon 4-word sequences that it has only seen once. This is because 4 words form a token, not a context window. Markov chains don't have what LLMs have.

    If you use a syllable-level token in Markov models the model can't form real words much beyond the second syllable, and you have no way of making it make more sense other than increasing the token size, which exponentially decreases originality. This is the simplest way I can explain it, though I had to address why scaling doesn't work.

    [1] There are 4^400000 possible 4-word sequences in English (barring grammar) meaning only a corpus with 8 times that amount of words and with no repetition could offer two ways to chain each possible 4 word sequence.

  • Yeah, there's only two differences between using Markov chains to predict words and LLMs:

    * LLMs don't use Markov chains, * LLMs don't predict words.

  • They are definitely not Markov Chains they may, however, be Markov Models. There's a difference between MC and MM.

    • What do you mean? The states are fully observable (current array of tokens), and using an LLM we calculate the probabilities of moving between them. What is not MC about this?

      5 replies →

Don't know what happened. I stumbled onto a funny coincidence - me talking to a LLM about its similarities with MC - decided to share on a post about using MC to generate text. Got some nasty comments and a lot of down votes. Even though my comment sparked a pretty interesting discussion.

Hate to be that guy, but I remember this place being nicer.

  • Ever since LLMS became popular, there's been an epidemic of people pasting ChatGPT output onto forums (or in your case, offering to). These posts are always received similarly to yours, so I'm skeptical that you're genuinely surprised by the reaction.

    Everyone has access to ChatGPT. If we wanted its "opinion" we could ask it ourselves. Your offer is akin to "Hey everyone, want me to Google this and paste the results page here?". You would never offer to do that. Ask yourself why.

    These posts are low-effort and add nothing to the conversation, yet the people who write them seem to expect everyone to be impressed by their contribution. If you can't understand why people find this irritating, I'm not sure what to tell you.

...are you under the impression that you have an exclusive relationship with "him"? Everyone else has access to ChatGPT too.

  • Yes. Yes I was. Thank you for the wake up call. I was under the impression that he was talking only to me.