Comment by mr_toad

1 year ago

You need a non-linear activation function for the universal approximation theorem to hold. Otherwise, as others have said the model just collapses to a single layer.

Technically the output is still what a statistician would call “linear in the parameters”, but due to the universal approximation theorem it can approximate any non-linear function.

https://stats.stackexchange.com/questions/275358/why-is-incr...

3 comments

mr_toad

quantadev 1 year ago

As you can see in what I just posted about an inch below this, my point is that the process of training a NN does not involve adjusting any parameter to any non-linear functions. What goes into an activation function is a pure sum of linear multiplications and an add, but there's no "tunable" parameter (i.e. adjusted during training) that's fed into the activation function.

beckhamc 1 year ago
Learnable parameters on activations do exist, look up parametric activation functions.
- quantadev 1 year ago
  
  If course they do exist. A parameterized activation function is the most obvious thing to try in NN design, and has certainly been invented/studied by 1000s of researchers.