← Back to context

Comment by WithinReason

4 months ago

yes but this is not 1 bit matmul, it's 1.58 bits with expensive unpacking

5 comments

WithinReason

Reply

ismailmaj 4 months ago

The title and the repo uses 1-bit when it means 1.58 bits tertiary values, it doesn't change any of my arguments (still xors and pop_counts).

WithinReason 4 months ago
How do you do ternary matmul with popcnt on 1.58 bit packed data?
- ismailmaj 4 months ago
  
  Assuming 2 bit per values (first bit is sign and second bit is value).
  actv = A[_:1] & B[_:1]
  sign = A[_:0] ^ B[_:0]
  dot = pop_count(actv & !sign) - pop_count(actv & sign)
  It can probably be made more efficient by taking a column-first format.
  Since we are in CPU land, we mostly deal with dot products that match the cache size, I don't assume we have a tiled matmul instruction which is unlikely to support this weird 1-bit format.
  
  2 replies →