← Back to context

Comment by omneity

2 days ago

If performant FPGAs were more accessible we’d be able to download models directly into custom silicon, locally, and unlock innovation in inference hardware optimizations. The highest grade FPGAs also have HBM memory and are competitive (on paper) to GPUs. To my understanding this would be a rough hobbyist version of what Cerebras and Groq are doing with their LPUs.

Unlikely this will ever happen but one can always dream.

> highest grade FPGAs also have HBM memory

The three SKUs between Xilinx and Altera that had HBM are no longer manufactured because Samsung Aquabolt was discontinued.

FPGA for AI only makes sense when machine learning had diverse model architectures.

After Transformer took over AI, FPGA for AI is totally dead now. Because Transformer is all about math matrix calculation, ASIC is the solution.

Modern Datacenter GPU is nearly AISC now.

  • Yes, if you're doing what everyone else is doing you can just use tensor cores and libraries which optimize for that.

    Contrarily if you're doing something that doesn't map that well to tensor cores you have a problem: every generation a larger portion of the die is devoted to low/mixed precision mma operations. Maybe FGPAs can find a niche that is underserved by current GPUs, but I doubt it. Writing a cuda/hip/kokkos kernel is just soo much cheaper and accessible than vhdl it's not even funny.

    AMD needs to invest in that: Let me write a small FPGA kernel in line in a python script, compile it instantly and let me pipe numpy arrays into that (similar to cupy rawkernels). If that workflow works and let's me iterate fast, I could be convinced to get deeper into it.

    • The primary niche of FPGAs is low latency, determinism and low power consumption. Basically what if you needed an MCU, or many MCUs but the ones in the market don't have enough processing power?

      The Versal AI Edge line is very power efficient compared to trying to achieve the same number of FLOPs using a Ryzen based CPU.