Comment by shenberg

3 months ago

The short and unsatisfying answer is that an LLM generation is a markov chain, except that instead of counting n-grams in order to generate the posterior distribution, the training process compresses the statistics into the LLM's weights.

There was an interesting paper a while back which investigated using unbounded n-gram models as a complement to LLMs: https://arxiv.org/pdf/2401.17377 (I found the implementation to be clever and I'm somewhat surprised it received so little follow-up work)

0 comments

shenberg

No comments yet

Contribute on Hacker News ↗