← Back to context

Comment by gwern

6 years ago

Nope, the paper says it's just using the FastText word embedding: https://nips2018creativity.github.io/doc/entendrepreneur.pdf which is just a particularly well-done word2vec: https://arxiv.org/pdf/1712.09405.pdf 'Lasker' and 'chess' no doubt co-occur quite strongly in their Internet corpus.

On the other hand, it comes up with "Beagle" and "Labrador" as synonyms of "cat". Color me unimpressed.

This seems like something that has been done before, and the fact the paper has no references indicates to me that the author maybe didn't do background research. While [0] is different, I could have sworn I've seen a paper which discussed creating puns in this fashion.

[0] http://www.aclweb.org/anthology/W05-1614

  • > On the other hand, it comes up with "Beagle" and "Labrador" as synonyms of "cat". Color me unimpressed.

    'It doesn't matter whether a cat has floppy ears or yellow fur so long as it catches mice.'

  • fasttext and word2vec only encode contextual similarity, they're not meant to generate synonyms.

    • So, you agree with me then. You can't say "What do you call a sleepy cat? A /grumbea-gle/ (grumpy-beagle)!" It's a portmanteau, sure, but it is not really even close to the input. Using word2vec here is probably wrong, or should have different pruning for word variance. It's a flaw of the technique and choices of the author.