Comment by bee_rider
10 months ago
That seems like an odd comparison, specialty hardware is often better, right?
Hey, do DSPs have special hardware to help with FFTs? (I’m actually asking, this isn’t a rhetorical question, I haven’t used one of the things but it seems like it could vaguely be helpful).
Xilinx has a very highly optimized core for the FFT. You are restricted to power of 2 sizes. Which usually isn't a problem because its fairly common to zero pad an FFT anyway to avoid highly aliased (i.e. hard-edges) binning.
The downside of implementing directly in hardware, the size would be fixed.
They usually have dedicated acceleration hardware, yes: https://www.ti.com/lit/an/sprabb6b/sprabb6b.pdf?ts=174057874...
yes, almost all DSPs I know have native HW supports for FFT, since it's the bread and butter for signal processing
I remember hearing about logic to help with deinterleaving the results of the butterfly network after the FFT is done.
Yeah, bit-reversed addressing mode as seen on the dsPIC is an example of this.
(Discrete) Fast Fourier Transform implementations:
https://fftw.org/ ; FFTW: https://en.wikipedia.org/wiki/FFTW
gh topic: fftw: https://github.com/topics/fftw
xtensor-stack/xtensor-fftw is similar to numpy.fft: https://github.com/xtensor-stack/xtensor-fftw
Nvidia CuFFTW, and/amd-fftw, Intel MKL FFTW
NVIDIA CuFFT (GPU FFT) https://docs.nvidia.com/cuda/cufft/index.html
ROCm/rocFFT (GPU FFT) https://github.com/ROCm/rocFFT .. docs: https://rocm.docs.amd.com/projects/rocFFT/en/latest/
AMD FFT, Intel FFT: https://www.google.com/search?q=AMD+FFT , https://www.google.com/search?q=Intel+FFT
project-gemmi/benchmarking-fft: https://github.com/project-gemmi/benchmarking-fft
"An FFT Accelerator Using Deeply-coupled RISC-V Instruction Set Extension for Arbitrary Number of Points" (2023) https://ieeexplore.ieee.org/document/10265722 :
> with data loading from either specially designed vector registers (V-mode) or RAM off-the-core (R-mode). The evaluation shows the proposed FFT acceleration scheme achieves a performance gain of 118 times in V-mode and 6.5 times in R-mode respectively, with only 16% power consumption required as compared to the vanilla NutShell RISC-V microprocessor
"CSIFA: A Configurable SRAM-based In-Memory FFT Accelerator" (2024) https://ieeexplore.ieee.org/abstract/document/10631146
/? dsp hardware FFT: https://www.google.com/search?q=dsp+hardware+fft