Comment by duskwuff

6 years ago

Less effectively. GPT-2 and a Markov chain are both predictive models; GPT-2 just happens to be a much more complex (and, in most cases, more accurate) model for English text, so fewer bits are required on average to encode the delta between its predictions and the actual text.

0 comments