Comment by sigmoid10
3 hours ago
Pure gradient descent is not what happens in either field, but e.g. momentum is just another parameter constructed from historic gradients. While it is unlikely that the brain runs backpropagation the way you see it implemented in modern ML (same goes for TD btw), the core principle kind of needs to be the same from a pure large scale, high dimensional network efficiency POV. On top of that, adaptive plasticity is almost by definition about estimating useful directions of change. The key insight here would be that the brain does gradient estimation quite cheap and we can probably still learn a thing or two about modern ML from it.
No comments yet
Contribute on Hacker News ↗