Comment by leereeves

1 year ago

Definitely. That's a fundamental observation called the bias-variance tradeoff. More flexible models are prone to overfitting, hitting each training point exactly with wild gyrations in between.

Big AI minimizes that problem by using more data. So much data that the model often only sees each data point once and overfitting is unlikely.

But while keeping the data constant, adding more and more parameters is a strategy that works, so what gives? Are the functions getting somehow regularized during training so effectively you could get away with fewer parameters, it's just that we don't have the right model just yet?