Comment by spot5010

2 months ago

That would be super cool if it works! I’ve also wondered the same thing about activation functions. Why not let the algorithm learn the activation function?

Mostly because of computational efficiency irrc, the non linearity doesn’t seem to have much impact, so picking one that’s fast is a more efficient use of limited computational resources.