Comment by elchananHaas
3 days ago
A thought on why the intermediate L2 losses are important: In the early layers there is little information so the L2 loss will be high and images blurry. In much deeper layers the information from the argmins will dominate and there will be little information left to learn. The L2 losses from the intermediate layers help this by providing a good training signal when there is some information known about the target, but there are still large unknowns.
The model can be thought of as N Discrete Distribution Networks, one of each depth 1 to N, that are stacked on each other and are being trained simultaneously.
No comments yet
Contribute on Hacker News ↗