Comment by ziofill
1 year ago
But while keeping the data constant, adding more and more parameters is a strategy that works, so what gives? Are the functions getting somehow regularized during training so effectively you could get away with fewer parameters, it's just that we don't have the right model just yet?
No comments yet
Contribute on Hacker News ↗