← Back to context

Comment by bigfatkitten

6 days ago

That was certainly the dream, but unfortunately for them it didn't turn out to be a new market.

I don't know enough about hardware to know why - why didn't this story play out as hoped?

  • FPGA dev is just much more painful and more expensive than software dev at every step.

    That's in no small part because the industry & tools seem to be stuck decades in the past. They never had their "GCC moment". But there's also inherent complexity in working at a very low level, having to pay attention to all sorts of details all the time that can't easily be abstracted away.

    There's the added constraint that FPGA code is also not portable without a lot of extra effort. You have to pick some specific FPGA you want to target, and it can be highly non-trivial to port it to a different one.

    And if you do go through all that trouble, you find out that running your code on a cloud FPGAs turns out to be pretty damn expensive.

    So in terms of perf per dollar invested, adding SIMD to your hot loop, or using a GPU as an accelerator may have a lower ceiling, but it's much much more bang for the buck and involves a whole lot less pain along the way.

  • It's hard to find places where FPGAs really win. For relatively simple tasks FPGA can beat just about anything in latency. For instance for the serialization/deserialization end of a high frequency trading system. If a problem has a large working set and needs to store data in DRAM it needs a memory controller the same way a CPU or GPU has a memory controller and this can only be efficient if the system's memory access pattern is predictable.

    You can certainly pencil out FPGA or ASIC systems that which would attain high levels of efficient parallelism if there wasn't memory bandwidth or latency limits but there are. If you want to do math that GPUs are good at, you use GPUs. Historically some FPGAs have let you allocate bits in smaller slices so if you only need 6 bit math you can have 6 bit math but GPUs are muscling in on that for AI applications.

    FPGAs really are good at bitwise operations used in cryptography. They beat CPUs at code cracking and bitcoin mining but in turn they get beat by ASICs. However there is some number of units (say N=10,000) where the economics of the ASIC plus the higher performance will drive you to ASIC -- for Bitcoin mining or for the NSA's codebreaking cluster. You might prototype this system on an FPGA before you get masks made for an ASIC though.

    For something like the F-35 where you have N=1000 or so, could care less about costs, and might need to reconfigure it for tomorrow's threats, the FPGA looks good.

    One strange low N case is that of display controllers for retrocomputers. Like it or not a display controller has one heck of a parts count to make out of discrete parts and ASIC display controllers were key to the third generation of home computers which were made with N=100,000 or so. Things like

    https://www.commanderx16.com/

    and are already expensive compared to the Raspberry Pi so they tend to use either a microcontroller or FPGA, the microcontroller tends to win because an ESP32 which costs a few buck is, amazingly, fast enough to drive a A/D converter at VGA rates or push enough bits for HDMI!

    • >It's hard to find places where FPGAs really win.

      Rapid product development. Got a project that needs to ship in 6-9 months and will be on the market for less than two years in small volume? Thats where FPGAs go. Medical, test and measurement, military, video effects, telepresence, etc.

      2 replies →

  • Most of my knowledge about FPGAs come from ex-FPGA people, so take this with a grain of salt:

    First off, clock rates on an FPGA run at about a tenth that of CPUs, which means you need a 10× parallelism speedup just to break-even, which can be a pretty tall order, even for a lot of embarrassingly parallel problems.

    (This one is probably a little bit garbled) My understanding is that the design of FPGAs is such that they're intrinsically worse at getting you FLOP/memory bandwidth number than other designs, which also gives you a cap on expected perf boosts.

    The programming model is also famously bad. FPGAs are notorious for taking forever to compile--and the end result of waiting half an hour or more might simply be "oops, your kernel is too large." Also, to a degree, a lot of the benefits of FPGA are in being able to, say, do a 4-bit computation instead of having to waste the logic on a full 8-bits, which means your code also needs to be tailored quite heavily for an FPGA, which makes it less accessible for most programmers.

  • Tooling mostly. To write fast code for CPUs you need a good optimizing compiler, like clang or gcc. Imagine how much work has gone into making them good. We're talking thousands of man years over several decades. You need just as good tooling for FPGAs and it takes just as much effort to produce. Except the market is orders of magnitudes smaller. You can also not "get help" from the open source community since high-end FPGAs are way to expensive for most hackers.

    Intel tried to get around this problem by having a common framework. So one compiler (based on clang) with multiple backends for their CPUs, FPGAs, and GPUs. But in practice it doesn't work. The architectures are too different.

  • There is nothing quite like gcc or LLVM for FPGAs yet. FPGA tooling is still stuck in the world of proprietary compilers and closed software stacks. It makes the whole segment of the industry move slower and have higher friction. This is just starting to break with Yosys and related tools, which are showing wild advantages in efficiency over some of the proprietary tooling, but still only support a fraction of available chips, mostly the smaller ones.

  • I'm just a casual observer, but I'm pretty sure one hard thing about FPGAs is preventing abuse. A customer could easily set up a ring oscillator that burns out all the LUTs. Another thing is FPGAs are about 10x slower than dedicated logic, so CPUs/GPUs beat them for a lot of applications. Plus, there's not a lot of logic designers in the first place. Software skills don't transfer over very well. For example, a multiplier is about the size of 8kb of RAM, so lookups and complex flow are way more expensive than just multiplying a value again (kinda like GPUs, except if you only had an L1 cache without main memory).

    • Not sure why I'm being downvoted, would those who downvoted me explain why? I try to be accurate so if I missed any important details I'd like to know :)

      4 replies →