Comment by tgv
1 year ago
That problem has plagued RNNs since the 90s: there's an information precision problem (how many bits do you need older states to carry), a decay problem (the oldest information is the weakest) and a mixing problem (it tends to mix/sum representations).
No comments yet
Contribute on Hacker News ↗