Comment by omneity
1 day ago
Thanks, this was helpful! Reading the seminal paper[0] on Universal Transformers also gave some insights:
> UTs combine the parallelizability and global receptive field of feed-forward sequence models like the Transformer with the recurrent inductive bias of RNNs.
Very interesting, it seems to be an “old” architecture that is only now being leveraged to a promising extent. Curious what made it an active area (with the works of Samsung and Sapient and now this one), perhaps diminishing returns on regular transformers?
No comments yet
Contribute on Hacker News ↗