Comment by tigershark
2 days ago
The biggest model that they have used has only 760M parameters, and it outperforms models 1 order of magnitude larger.
2 days ago
The biggest model that they have used has only 760M parameters, and it outperforms models 1 order of magnitude larger.
Gah dmn