Comment by devmor

2 years ago

Markov chains were one of the coolest discoveries of my programming career and I spent years using them to make forum and social media bots trained on people’s post histories for them.

I think that experience is part of why I’ve been generally unimpressed with a lot of LLM hype. Like yeah, it’s cool and definitely more useful than a markov chain - but for the amount of resources that went into it I’d expect the gap to be quite a bit larger than it really is.

I also loved when I found about Markov chains and had a lot of fun playing with them, but have a totally different view on the gap with respect to LLMs.

Markov chains were discovered in 1906. Since then until a few years ago, advances on "building a better Markov chain" have been modest (e.g. smoothing techniques).

Enter the last 5 years, LLMs come and now you have an "uber Markov chain" that actually generates perfect syntactically coherent text, you can even ask it things and if the question is well-posed and makes sense you will get a true answer at least the majority of the time, they can be a daily tool (for practical purposes, beyond fun), help you solve problems and write interesting creative stories. A much larger leap in those 5 years than in the previous century!

I see them as what I always dreamed Markov chains to be, but they couldn't be. The gap is huge.

> I think that experience is part of why I’ve been generally unimpressed with a lot of LLM hype.

There are fundamental scale & architectural differences that make LLMs different from Markov Chains: It's like saying that you're not impressed by indoor plumbing because you have experience carrying your water from a well, and that both do the same thing - transporting water to your home.

In both cases, this line of logic ignores the improvements made to make such a thing remotely possible, and the difference in relative usefulness that can be gained from LLMs in comparison to Markov Chains.

I first learned about Markov chains in the late 90s when I ran across a Markov chain bot on IRC. The person who wrote it was nice and answered my questions about it. I managed to write my own shitty copy but was pleased enough with myself. Same as you I thought it was one of the coolest things I came across in programming.

Also same as you why I've been very cool to LLM hype. Not that LLMs aren't feats of their own but they're not nearly as smart as their hype suggests.

Gap between GPT-4 and Markov Chains is huge though. Even gap between GPT-4 and GPT-3.5 seems obviously huge to me in terms of what they are able to do.