← Back to context

Comment by roflmaostc

1 year ago

Reference for the later part?

7 comments

roflmaostc

Reply

yagizdegirmenci 1 year ago

The section "3.3 Implementation" is mostly about hardware level speedups, which basically says:

On GPU(s) FFT is consistently faster, but in TPU(s), for shorter sequences matrix multiplication was faster.

xphos 1 year ago
Yeah but a comparison in power utilization is needed too. You can build hardware that is better than a GPU at something i.e MatMul being really efficient and fast. However, actual FFT hardware would annihilate power and speed at large enough n. Simply because the number of multiplications MatMul does is O(n^3) as opposed to the O(n log n) multiplies that FFT does (complex verse real multiplies with holding).
- SJC_Hacker 1 year ago
  
  FFT is only O(N log N) for a vector of length N WRT to matrices for an N by N matrix it would be like O(N^2 log N) you would perform FFT for each row or column
  
  4 replies →