Comment by spot5010

1 year ago

That would be super cool if it works! I’ve also wondered the same thing about activation functions. Why not let the algorithm learn the activation function?

2 comments

spot5010

porridgeraisin 1 year ago

This idea exists (the broad field is called neural architecture search), although you have to parameterize it somehow to allow gradient descent to happen.

Here are examples:

https://arxiv.org/abs/2009.04759

https://arxiv.org/abs/1906.09529

FuckButtons 1 year ago

Mostly because of computational efficiency irrc, the non linearity doesn’t seem to have much impact, so picking one that’s fast is a more efficient use of limited computational resources.