Comment by cubefox

3 months ago

Yes they can overfit. SLT assumed that this is caused by large VC dimension. Which apparently isn't true because there exist various techniques/hacks which effectively combat overfitting while not actually reducing the very large VC dimension of those neural networks. Basically, the theory predicts they always overfit, while in reality they mostly work surprisingly well. That's often the case in ML engineering: people discover things work well and others don't, while not being exactly sure why. The famous Chinchilla scaling law was an empirical discovery, not a theoretical prediction, because theories like SLT are far too weak to make interesting predictions like that. Engineering is basically decades ahead of those pure-theory learning theories.

> Please point me to these papers because I'm still learning.

Not sure which papers you have in mind. To be clear, I'm not an expert, just an interested layman. I just wanted to highlight the stark difference between the apparently failed pure math approach I learned years ago in a college class, and the actual ML papers that are released today, with major practical breakthroughs on a regular basis. Similarly practical papers were always available, just from very different people, e.g. LeCun or people at DeepMind, not from theoretical computer science department people who wrote text books like the one here. Back in the day it wasn't very clear (to me) that those practice guys were really onto something while the theory guys were a dead end.

0 comments

cubefox

No comments yet

Contribute on Hacker News ↗