Comment by duskwuff
6 years ago
Less effectively. GPT-2 and a Markov chain are both predictive models; GPT-2 just happens to be a much more complex (and, in most cases, more accurate) model for English text, so fewer bits are required on average to encode the delta between its predictions and the actual text.
No comments yet
Contribute on Hacker News ↗