← Back to context

Comment by Al-Khwarizmi

2 years ago

I always say that both this and the BERT paper are breakthrough contributions, but quite awful papers (when we talk about literally the papers, not the discoveries or the software). They're quite badly written and explained (and I don't think they're better than most, at least in NLP which is what I typically read) and they both feel like post hoc rationalizations for massive trial and error. This is common in papers coming from big industry labs, to be honest. I tend to find papers from academia better written, although I may be biased due to being an academic myself.

Masking is all you need would be a better description.

  • What is "masking" in a paper that also has a section dedicated to mask segmentation ("masking" as in creating segmentation masks)?