Comment by madhadron
18 hours ago
All these transforms are switching to an eigenbasis of some differential operator (that usually corresponds to a differential equation of interest). Spherical harmonics, Bessel and Henkel functions, which are the radial versions of sines/cosines and complex exponential, respectively, and on and on.
The next big jumps were to collections of functions not parameterized by subsets of R^n. Wavelets use a tree shapes parameter space.
There’s a whole, interesting area of overcomplete basis sets that I have been meaning to look into where you give up your basis functions being orthogonal and all those nice properties in exchange for having multiple options for adapting better to different signal characteristics.
I don’t think these transforms are going to be relevant to understanding neural nets, though. They are, by their nature, doing something with nonlinear structures in high dimensions which are not smoothly extended across their domain, which is the opposite problem all our current approaches to functional analysis deal with.
You may well be right about neural networks. Sometimes models that seem nonlinear turns linear if those nonlinearities are pushed into the basis functions, so one can still hope.
For GPT like models, I see sentences as trajectories in the embedded space. These trajectories look quite complicated and no obvious from their geometrical stand point. My hope is that if we get the coordinate system right, we may see something more intelligible going on.
This is just a hope, a mental bias. I do not have any solid argument for why it should be as I describe.
> Sometimes models that seem nonlinear turns linear if those nonlinearities are pushed into the basis functions, so one can still hope.
That idea was pushed to its limit by the Koopman operator theory. The argument sounds quite good at first, but unfortunately it can’t really work for all cases in its current formulation [1].
[1]: https://arxiv.org/abs/2407.08177
Quite so. Quite so indeed.
We know that under benign conditions and infinite dimensional basis must exist but finding it from finite samples is very non-trivial, we don't know how to do it in the general case.
I’m not sure what you mean by a change of basis making a nonlinear system linear. A linear system is one where solutions add as elements of a vector space. That’s true no matter what basis you express it in.
Note that I'm not great at math so it's possible I've entirely misunderstood you.
Here's an example of directly leveraging a transform to optimize the training process. ( https://arxiv.org/abs/2410.21265 )
And here are two examples that apply geometry to neural nets more generally. ( https://arxiv.org/abs/2506.13018 ) ( https://arxiv.org/abs/2309.16512 )
From the abstract and skimming a few sections of the first paper, imho it is not really the same. The paper is moving the loss gradient to the tangent dual space where weights reside for better performance in gradient descent, but as far as I understand neither the loss function nor the neural net are analyzed in a new way.
The Fourier and Wavelet transforms are different as they are self-adjoint operators (=> form an orthogonal basis) on the space of functions (and not on a finite dimensional vector space of weights that parametrize a net) that simplify some usually hard operators such as derivatives and integrals, by reducing them to multiplications and divisions or to a sparse algebra.
So in a certain sense these methods are looking at projections, which are unhelpful when thinking about NN weights since they are all mixed with each other in a very non-linear way.
Thanks a bunch for the references. Reading the abstract these used a different idea compared to what Fourier analysis is about, but nonetheless should be a very interesting read.