Comment by fc417fc802

1 month ago

Note that I'm not great at math so it's possible I've entirely misunderstood you.

Here's an example of directly leveraging a transform to optimize the training process. ( https://arxiv.org/abs/2410.21265 )

And here are two examples that apply geometry to neural nets more generally. ( https://arxiv.org/abs/2506.13018 ) ( https://arxiv.org/abs/2309.16512 )

2 comments

fc417fc802

nihzm 1 month ago

From the abstract and skimming a few sections of the first paper, imho it is not really the same. The paper is moving the loss gradient to the tangent dual space where weights reside for better performance in gradient descent, but as far as I understand neither the loss function nor the neural net are analyzed in a new way.

The Fourier and Wavelet transforms are different as they are self-adjoint operators (=> form an orthogonal basis) on the space of functions (and not on a finite dimensional vector space of weights that parametrize a net) that simplify some usually hard operators such as derivatives and integrals, by reducing them to multiplications and divisions or to a sparse algebra.

So in a certain sense these methods are looking at projections, which are unhelpful when thinking about NN weights since they are all mixed with each other in a very non-linear way.

srean 1 month ago

Thanks a bunch for the references. Reading the abstract these used a different idea compared to what Fourier analysis is about, but nonetheless should be a very interesting read.