Comment by viktor_von

1 year ago

> I remember one of the initial transformer people saying in an interview that they didn't think this was the "one true architecture" but a lot of the performance came from people rallying around it and pushing in the one direction.

You may be referring to Aidan Gomez (CEO of Cohere and contributor to the transformer architecture) during his Machine Learning Street Talk podcast interview. I agree, if as much attention had been put towards the RNN during the initial transformer hype, we may have very well seen these advancements earlier.