← Back to context Comment by cztomsik 2 months ago hm, residual is what I would not expect, can you elaborate why? 1 comment cztomsik Reply simsla 2 months ago Avoids vanishing gradients in deeper networks.Also, most blocks with a residual approximate the identity function when initialised, so tend to be well behaved.
simsla 2 months ago Avoids vanishing gradients in deeper networks.Also, most blocks with a residual approximate the identity function when initialised, so tend to be well behaved.
Avoids vanishing gradients in deeper networks.
Also, most blocks with a residual approximate the identity function when initialised, so tend to be well behaved.