Comment by sebzim4500

3 years ago

Am I understanding correctly that the models are being tested on languages they've barely/never seen before?

How is it surprising that they would not be able to beat basic statistical techniques? That seems intuitive to me.

Also the title is misleading at best.

1 comment

sebzim4500

Yes it only outperforms on Our of distribution data and even then it only outperforms Bert and not bigger transformers. It's a win for sure but very misleading title.