← Back to context

Comment by markisus

10 months ago

The Fourier transform is taken along the “token” dimension. However, in many applications, this dimension is not meaningful. That’s why transformers are a great option for consuming data which is permutation invariant.

I would like to see additional experiments using the lesser known Fourier transform over finite groups [1], which is permutation invariant but shares many properties with the standard Fourier transform.

I also wonder if this becomes the next big thing for LLMs, how easy will it be for inference engines(eg vLLM, llama.cpp) to integrate it?

[1] https://en.wikipedia.org/wiki/Fourier_transform_on_finite_gr...

Not an expert in this space.

Aren't tokens transformed with position dependent information in most models?

I believe llama applies a rotation to the vector based on the position in the input.

What's the finite group in this case?

  • I’m thinking the integers mod 2^n where n is something computers are good at (8, 32, 64). You have hardware support the group operation.

    • That is the traditional Fourier transform, except it can be a cyclic group of any size, doesn't need to be a power of 2. (Though FFTs with 2^n input size are particularly easy to implement.)

      And it's not permutation invariant.

      1 reply →

    • You mean for the group operation to be standard modular addition? In that case (as a sibling comment says) you'll recover the classic discrete Fourier transform.