← Back to context

Comment by cs702

1 year ago

I finally got around to reading this. Nice paper, but it fails to address a key question about RNNs:

Can RNNs be as good as Transformers at recalling information from previous tokens in a sequence?

Transformers excel at recalling info, likely because they keep all previous context around in an ever-growing KV cache.

Unless proponents of RNNs conclusively demonstrate that RNNs can recall info from previous context at least as well as Transformers, I'll stick with the latter.