Comment by ogrisel

9 hours ago

Paul Werbos did not apply backprop to MLPs as cleanly described in Hinton's paper, but rather to some kind of autoregressive non-linear parametrized functions with a much more specific application scope.

Both papers are direct applications of the chain rule applied to estimate the gradient of a multivariate function.