Comment by kens

1 day ago

Author here for your 8087 questions. I find adders and ALUs interesting because they are key to the performance of a system and every system implements them differently.

12 comments

kens

sebgan 1 day ago

No immediate questions, but happy to have some great weekend reading. A quick pass through finds one of the best and clearest explainers I've seen. Thanks for this and all the materials you produce.

mitthrowaway2 1 day ago

Do you know about how many transistors are needed to implement the adder (or the FPU as a whole)? And how it scales with the width of the numbers (16 bit, 32 bit, etc)?

I've been curious about transistor counts for floating point units for a while, but it's hard to find information about them.

kens 1 day ago

I count approximately 2014 transistors (including pull-ups) for the 69-bit adder. Each block of four bits takes approximately 117 transistors.

Aardwolf 1 day ago

Any idea how much adder designs changed on modern CPUs compared to back then? I mean there's only so much you can optimize in those, I think...

kens 1 day ago
Even by the time of the Pentium, they had moved to much more complicated adders like Kogge-Stone. I wrote about it here: https://www.righto.com/2025/01/pentium-carry-lookahead-rever...
- B1FF_PSUVM 1 day ago
  
  Do you have anything on those TRW floating point chips that used to titillate junior engineers in trade mag advertisements before that?
rcxdude 1 day ago
There's a surprising amount of optimization possible in them. You can improve the latency of them substantially at the cost of a lot more transistors.
- B1FF_PSUVM 1 day ago
  
  For example, an adder's total delay depends on a carry chain. If you have N 4-bit slices, the last slice has to wait for the carry to propagate through all N-1 previous slices.
  But if you duplicate all your slices, you can have the results for both carry = 0 and carry = 1 inputs. Then just switch which one is correct - total time 1 add plus N-1 switches.
  Just for double (and change) the hardware. Cheap.
bsder 17 hours ago

I believe that every single adder architecture we now use was known by 1980s. The "optimization" is matching the theory to the engineering of the day.
The reason you don't use prefix adders in 1980 is that you can't possibly route them because you don't have enough metal. So instead, you use chunks of Manchester carry chain because the "tapping internal nodes" that everybody cites allows you to route nodes in diffusion and polysilicon instead of having to use metal.
Of course, THAT only works because you have 5V (or more) and can connect lots of transistors in series and still have them work. As your voltage falls you can't connect as many transistors in series, so you switch to architectures that prefer active gates over passthroughs and long chains.
So, as your available metal layers, supply voltage, transistor speed, threshold voltages, capacitive load and power dissipation all shift over the engineering landscape, your "optimization" shifts with it.

m1333 1 day ago

> take two clock cycles to complete an addition.

How does the clocking work exactly? The circuit is fed A and B and up down up down clock and then the output appears? How does the consumer (circuit) know when to read the result? Is there a "result is ready" flag? How long does the result stay stable? One full clock cycle? So many questions...

JdeBP 1 day ago
The adder is not clocked. You can see from the diagrams that there are no clock inputs. The clock cycles comment is more an expression of the length of time that it takes before all of the carry rippling and whatnot settles down.
- kens 1 day ago
  
  In more detail, the microcode engine normally executes one micro-instruction per cycle. For addition, the engine is blocked for one extra cycle to give the result time to percolate through the adder.
  There is some complicated timing within a clock cycle with slightly delayed clocks and whatnot, for instance, to precharge the carry lines at the beginning of the operation. The 8087 is mostly synchronous with the clock, but they "cheat" in many places.