← Back to context

Comment by rcxdude

1 day ago

There's a surprising amount of optimization possible in them. You can improve the latency of them substantially at the cost of a lot more transistors.

For example, an adder's total delay depends on a carry chain. If you have N 4-bit slices, the last slice has to wait for the carry to propagate through all N-1 previous slices.

But if you duplicate all your slices, you can have the results for both carry = 0 and carry = 1 inputs. Then just switch which one is correct - total time 1 add plus N-1 switches.

Just for double (and change) the hardware. Cheap.