Comment by monoidl

5 days ago

I think this is more correctly described as a trigram model than a Markov model, if it would naturally expand to 4-grams when they were available, etc, the text would look more coherent

Iirc there was some research on "infini-gram", that is a very large ngram model, that allegedly got performance close to LLMs in some domains a couple years back

Google made some very large ngram models around twenty years ago. This being before the era of ultra-high-speed internet, it was distributed as a set of 6 DVDs.

It achieved state-of-the-art performance at tasks like spelling correction at the time. However, unlike an LLM, it can't generalize at all; if an n-gram isn't in the training corpus it has no idea how to handle it.

https://research.google/blog/all-our-n-gram-are-belong-to-yo...

  • I have this DVD set in my basement. Technically, there are still methods for estimating the probability of unseen ngrams. Backoff (interpolating with lower grams) is an option. You can also impose prior distributions like a Bayesian so that you can make "rational" guesses.

    Ngrams are surprisingly powerful for how little computation they require. They can be trained in seconds even with tons of data.