Comment by pointlessone

8 months ago

I’m still confused. Does it treat the input tokens as a sampled waveform?

I mean, say I have some text file in ASCII. Do I then just pretend it’s raw wav and do FFT on it? I guess it can give me some useful information (like does it look like any particular natural language or is it just random; sometimes used in encrytion analysis of simple substitution cyphers). It feels surprising that revers FFT can get a coherent output after fiddling with the distribution.

5 comments

pointlessone

xeonmc 8 months ago

Do keep in mind that FFT is a lossless, equivalent representation of the original data.

yobbo 8 months ago

As I understand it, the token embedding stream would be equivalent to multi-channel sampled waveforms. The model either needs to learn the embeddings by back-propagating through FFT and IFFT, or use some suitable tokenization scheme which the paper doesn't discuss (?).

It seems unlikely to work for language.

jampekka 8 months ago

It embeds them first into vectors. The input is a real matrix with (context length)x(embedding size) dimensions.

blovescoffee 8 months ago

No. The FFT is an operation on a discrete domain, it is not the FT. In the same way audio waveforms are processed by an FFT you bucket frequencies which is conceptually a vector. Once you have a vector, you do machine learning like you would with any vector (except you do some FT in this case, I haven’t read the paper).

lta 8 months ago

Most likely the embedding of the token is passed through FFT