← Back to context

Comment by a-dub

2 months ago

hm. nothing that happens in the fourier domain can touch the phases, which seems like a constraint that would change the behavior of other layers.

the default bias of -0.1 with relus and what i would expect to be a flattish spectrum also seems like it would make for a sparse representation in the fourier domain.

i assume this is learning the text embeddings at training time, if so, i'd be curious how the constraints of going through the fft and filtering magnitudes would/could change how the embeddings look.