Comment by markisus

2 months ago

The Fourier transform is taken along the “token” dimension. However, in many applications, this dimension is not meaningful. That’s why transformers are a great option for consuming data which is permutation invariant.

I would like to see additional experiments using the lesser known Fourier transform over finite groups [1], which is permutation invariant but shares many properties with the standard Fourier transform.

I also wonder if this becomes the next big thing for LLMs, how easy will it be for inference engines(eg vLLM, llama.cpp) to integrate it?

[1] https://en.wikipedia.org/wiki/Fourier_transform_on_finite_gr...

Not an expert in this space.

Aren't tokens transformed with position dependent information in most models?

I believe llama applies a rotation to the vector based on the position in the input.

What's the finite group in this case?

  • I’m thinking the integers mod 2^n where n is something computers are good at (8, 32, 64). You have hardware support the group operation.

    • That is the traditional Fourier transform, except it can be a cyclic group of any size, doesn't need to be a power of 2. (Though FFTs with 2^n input size are particularly easy to implement.)

      And it's not permutation invariant.

      1 reply →

    • You mean for the group operation to be standard modular addition? In that case (as a sibling comment says) you'll recover the classic discrete Fourier transform.