Comment by lxe

6 years ago

> using the probability of the next word computed by the GPT-2 language model

Can the same effect be achieved by looking at actual probability of the next word from a large corpus of existing text (a-la markov chains)?

1 comment

lxe

duskwuff 6 years ago

Less effectively. GPT-2 and a Markov chain are both predictive models; GPT-2 just happens to be a much more complex (and, in most cases, more accurate) model for English text, so fewer bits are required on average to encode the delta between its predictions and the actual text.