Comment by pillefitz
2 months ago
In image processing at least, NN typically learn a Fourier or Wavelet representation in their first layers. Biggest benefit of applying a transformation beforehands is to reduce training time / obtain better generalization by "removing the dimension that doesn't matter".
E.g. in a suitable space, one coordinate could represent the rotation of an object. You could do the transform and discard this dimension if your NN should be rotating invariant.
In image processing I thought there was a whole host of specialized algorithms, such as edge detection, SCC, etc. that were run before the data was even fed into the ANN.