Comment by chadcmulligan

14 hours ago

"why do neural networks work better than other models?" That sounds really interesting - any references (for a non specialist)?

25 comments

chadcmulligan

andbberger 13 hours ago

https://en.wikipedia.org/wiki/Universal_approximation_theore...

the better question is why does gradient descent work for them

jmalicki 12 hours ago
The properties that the uniform approximation theorem proves are not unique to neural networks.
Any models using an infinite dimensional Hilbert space, such as SVMs with RBF or polynomial kernels, Gaussian process regression, gradient boosted decision trees, etc. have the same property (though proven via a different theorem of course).
So the universal approximation theorem tells us nothing about why should expect neural networks to perform better than those models.
- hodgehog11 12 hours ago
  
  Extremely well said. Universal approximation is necessary but not sufficient for the performance we are seeing. The secret sauce is implicit regularization, which comes about analogously to enforcing compression.
  
  3 replies →
- soVeryTired 5 hours ago
  
  Whenever people bring this up I like to remind them that linear interpolation is a universal function approximator.
  
  2 replies →
- NooneAtAll3 9 hours ago
  
  Universal approximation is like saying that a problem is computable
  sure, that gives some relief - but it says nothing in practice unlike f.e. which side of P/NP divide the problem is on
  
  2 replies →
hansvm 1 hour ago

Interestingly, there exist problems which provably can't be learned via gradient descent for them.
fc417fc802 12 hours ago
I don't follow. Why wouldn't it work? It seems to me that a biased random walk down a gradient is about as universal as it gets. A bit like asking why walking uphill eventually results in you arriving at the top.
- hodgehog11 12 hours ago
  
  It wouldn't work if your landscape has more local minima than atoms in the known universe (which it does) and only some of them are good. Neural networks can easily fail, but there's a lot of things one can do to help ensure it works.
  
  10 replies →