Comment by jph00
1 year ago
Highway networks add a skip connection, but LSTMs don't. Btw you might be interested in truncated backprop thru time, which we introduced in our ULMFiT paper.
1 year ago
Highway networks add a skip connection, but LSTMs don't. Btw you might be interested in truncated backprop thru time, which we introduced in our ULMFiT paper.
I was referring to how the context vectors help avoid vanishing gradients by behaving very similarly to skip-connections, but yes, they aren't skip-connections as-such. That's been my understanding, at least.
We haven't tried truncated BPTT, but we certainly should.
Funnily enough, we adopted AWD-LSTMs, Ranger21, and Mish in the paper I linked after I heard about them through the fast.ai community (we also trialled QRNNs for a bit too). fast.ai has been hugely influential in my work.