Comment by londogard
2 years ago
I'd like to note that this is only stronger on news.
Yahoo Questions it is not top performer. It's not far fetched to think that news are written in a similar way, sometimes even partly copied, and therefore have a lot of words in common. Yahoo Questions is a forum and I'd expect there to be a greater variation of word, but the word themself have a semantic similarity.
That is, gzip is strong when many words overlap (the size increase when gzipped is smaller) but if it's semantic similarity DNN's win everyday.
The results are interesting but not as interesting as it sounds IMO.
How do they work then that semantic similarity would be any different? That's just a matter of grouping semantically similar 'representations' in training, surely?
Yes, what I'm saying is that gzip does not perform as well when it's not overlapping tokens exact.
Gzip does not support a "semantic" mode, hence it won't and does not (according to the papers metric) perform as well.
Deep learning can capture these semantic similarities.