Comment by monkfish328

8 months ago

Or for that matter, a transform that's learned from the data :) A neural net for the transform itself!

3 comments

monkfish328

That would be super cool if it works! I’ve also wondered the same thing about activation functions. Why not let the algorithm learn the activation function?

porridgeraisin 8 months ago

This idea exists (the broad field is called neural architecture search), although you have to parameterize it somehow to allow gradient descent to happen.
Here are examples:
https://arxiv.org/abs/2009.04759
https://arxiv.org/abs/1906.09529
FuckButtons 8 months ago

Mostly because of computational efficiency irrc, the non linearity doesn’t seem to have much impact, so picking one that’s fast is a more efficient use of limited computational resources.