Comment by Peteragain

2 months ago

So, are current LLMs better because artificial neural networks are better predictors than Markov models, or because of the scale of the training data? Just putting it out there..

5 comments

Peteragain

microtonal 2 months ago

Markov models usually only predict the next token given the two preceding tokens (trigram model) because the data gets so exceptionally sparse beyond that, that it becomes impossible to make probability estimations (despite back-off, smoothing, etc.).

I recommend you to read Bengio et al.’s 2003 paper which describes this issue in more detail and introduces distributional representations (embeddings) in an RNN to avoid this sparsity.

While we are using transformers and sentence pieces now, this paper aptly describes the motivation underpinning modern models.

https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf

lelanthran 2 months ago

> Markov models usually only predict the next token given the two preceding tokens (trigram model) because the data gets so exceptionally sparse beyond that
Of course, that's because it is a probability along a single dimension with a chain-length along that one dimension while LLMs and NNs use multiple dimensions (They are meshed, not chained).
I really want to know what the result would look like with a few more dimensions resulting in a markov mesh type structure rather than a chain structure.
Peteragain 2 months ago
Thanks for the reference and I stand corrected. And yes I had looked at it a long time ago and will give it another read. But I think it is saying that RNNs are a means of approximating a statistical property of a collection of text. That property is what we today think of as "completion"? That is, glorified auto complete, and not "distributed representations" of the world. Would you agree?
- microtonal 2 months ago
  
  distributed representations
  Distributional representations, not distributed.
  https://en.wikipedia.org/wiki/Distributional_semantics#Distr...
  
  1 reply →