Comment by a-dub
2 months ago
hm. nothing that happens in the fourier domain can touch the phases, which seems like a constraint that would change the behavior of other layers.
the default bias of -0.1 with relus and what i would expect to be a flattish spectrum also seems like it would make for a sparse representation in the fourier domain.
i assume this is learning the text embeddings at training time, if so, i'd be curious how the constraints of going through the fft and filtering magnitudes would/could change how the embeddings look.
No comments yet
Contribute on Hacker News ↗