Comment by aDyslecticCrow
1 year ago
LSTM and GRU did not quite solve the issue, but they made it less bad. Overall, recurrent units are nutritiously prone to vanishing and exploding gradients.
I don't want to downplay the value of these models. Some people seem to be under the perception that transformers replaced or made them obsolete, which is faar from the truth.
No comments yet
Contribute on Hacker News ↗