Comment by lxe
6 years ago
> using the probability of the next word computed by the GPT-2 language model
Can the same effect be achieved by looking at actual probability of the next word from a large corpus of existing text (a-la markov chains)?
6 years ago
> using the probability of the next word computed by the GPT-2 language model
Can the same effect be achieved by looking at actual probability of the next word from a large corpus of existing text (a-la markov chains)?
Less effectively. GPT-2 and a Markov chain are both predictive models; GPT-2 just happens to be a much more complex (and, in most cases, more accurate) model for English text, so fewer bits are required on average to encode the delta between its predictions and the actual text.