Comment by wongarsu

3 days ago

I've also always though that it's an interesting opportunity for custom hardware. Two bit addition is incredibly cheap in hardware, especially compared to anything involving floating point. You could make huge vector instructions on the cheap, then connect it to the fastest memory you can buy, and you have a capable inference chip.

You'd still need full GPUs for training, but for inference the hardware would be orders of magnitude simpler than what Nvidia is making

These are trits, which provide their own efficiencies.

Interestingly, a trit x float multiplier is cheaper than a trit x integer multiplier in hardware if you're willing to ignore things like NaNs.

0 and 1 are trivial, just a mux for identity and zero. But because floats are sign-magnitude, multiply by -1 is just an inverter for the sign bit, where as for integers you need a bitwise inverter and full incrermenter.

  • Do you know a good reference to learn more about this (quantizing weigths to 1.58 bits, and trit arithmetic)?

    • There's lots of literature on quantizing weights (including trits and binary) going back 15+ years. Nothing to hand right now but it's all on arxiv.

      The relevant trit arithmetic should be on display in the linked repo (I haven't checked). Or try working it out for the uncompressed 2 bit form with a pen and paper. It's quite trivial. Try starting with a couple bitfields (inputs and weights), a couple masks, and see if you can figure it out without any help.

You only need GPUs if you assume the training is gradient descent. GAs or anything else that can handle nonlinearities would be fine, and possibly fast enough to be interesting.