Comment by yodon
11 hours ago
So excited to see this - the big advantage of 1.58 bits is there are no multiplications at inference time, so you can run them on radically simpler and cheaper hardware.
11 hours ago
So excited to see this - the big advantage of 1.58 bits is there are no multiplications at inference time, so you can run them on radically simpler and cheaper hardware.
At 4 bits, you could just have a hard-wired table lookup. Two 4 bit values in, 256 entry table. You can have saturating arithmetic and a post-processing function for free. Somebody must be building hardware like that.
Low level software engineers use look up tables.
Hardware engineers realise that a compiler will almost always find some combination of gates which is smaller/faster than the contents of any table.
A LUT is pretty wasteful. You only have a one bit significand, so the mantissa and sign bits are boolean binops, and the exponent is a 2 bit adder.
and so you can at 1-bit too, and the hardware will be even smaller and cheaper