Comment by larodi
5 days ago
Indeed, is there a chance Google did not evaluate properly what the transformer will eventually be used for/become. It was created for translation as an improvement on seq2seq, right? Which was for translation, not for thinking, and to a certain extent... still is about translation, and are not other emergent capabilities actually a side-effect, only observed later when parameter size grew?
No comments yet
Contribute on Hacker News ↗