Comment by cs702
1 year ago
I finally got around to reading this. Nice paper, but it fails to address a key question about RNNs:
Can RNNs be as good as Transformers at recalling information from previous tokens in a sequence?
Transformers excel at recalling info, likely because they keep all previous context around in an ever-growing KV cache.
Unless proponents of RNNs conclusively demonstrate that RNNs can recall info from previous context at least as well as Transformers, I'll stick with the latter.
No comments yet
Contribute on Hacker News ↗