Comment by skydhash

12 hours ago

> The actual reason is due to complex biases that arise from the interaction of network architectures and the optimizers and persist in the regime where data scales proportionally to model size. The multiscale nature of the data induces neural scaling laws that enable better performance than any other class of models can hope to achieve.

That’s a lot of words to say that, if you encode a class of things as numbers, there’s a formula somewhere that can approximate an instance of that class. It works for linear regression and works as well for neural network. The key thing here is approximation.

2 comments

skydhash

hodgehog11 8 hours ago

No, it is relatively few words to quickly touch on several different concepts that go well beyond basic approximation theory.

I can construct a Gaussian process model (essentially fancy linear regression) that will fit _all_ of my medical image data _exactly_, but it will perform like absolute rubbish for determining tumor presence compared to if I trained a convolutional neural network on the same data and problem _and_ perfectly fit the data.

I could even train a fully connected network on the same data and problem, get any degree of fit you like, and it would still be rubbish.

bubblyworld 11 hours ago

That isn't what they are saying at all, lol.