Comment by roadside_picnic
1 day ago
> One might as well say that neural networks trained with gradient descent are a reinvention of numerical methods for function approximation.
I don't know anyone who would disagree with that statement, and this is the standard framing I've encountered in nearly all neural network literature and courses. If you read any of the classic gradient based papers they fundamentally assume this position. Just take a quick read of "A Theoretical Framework for Back-Propagation (LeCun, 1988)" [0], here's a quote from the abstract:
> We present a mathematical framework for studying back-propagation based on the Lagrangian formalism. In this framework, inspired by optimal control theory, back-propagation is formulated as an optimization problem with nonlinear constraints.
There's no way you can read that a not recognize that you're reading a paper on numerical methods for function approximation.
The issue is that Vaswani, et al never mentions this relationship.
If you mention every mathematical relationship one can think of in your paper, it’s going to get rejected for being way over the page limit lol.