Comment by sebzim4500
2 years ago
Am I understanding correctly that the models are being tested on languages they've barely/never seen before?
How is it surprising that they would not be able to beat basic statistical techniques? That seems intuitive to me.
Also the title is misleading at best.
Yes it only outperforms on Our of distribution data and even then it only outperforms Bert and not bigger transformers. It's a win for sure but very misleading title.