Comment by Legend2440

4 days ago

Google made some very large ngram models around twenty years ago. This being before the era of ultra-high-speed internet, it was distributed as a set of 6 DVDs.

It achieved state-of-the-art performance at tasks like spelling correction at the time. However, unlike an LLM, it can't generalize at all; if an n-gram isn't in the training corpus it has no idea how to handle it.

https://research.google/blog/all-our-n-gram-are-belong-to-yo...

I have this DVD set in my basement. Technically, there are still methods for estimating the probability of unseen ngrams. Backoff (interpolating with lower grams) is an option. You can also impose prior distributions like a Bayesian so that you can make "rational" guesses.

Ngrams are surprisingly powerful for how little computation they require. They can be trained in seconds even with tons of data.