← Back to context

Comment by phkahler

3 hours ago

>> I wanna see an inference chip where the weights are part of the rom of the chip.

I've been wondering about that for a while now. For a lot of tasks putting weights in ROM is probably OK. OTOH:

>> There would be 1 multiplier per weight...

I'm not sure that is a good idea. Maybe if its quantized down to 2 bits... Otherwise maybe a small ROM near each multiplier (or row of them or whatever) so the multipliers could handle N distinct matrix operations without having to move the data from far away.

Another fun thought is to have a row of MAC units on DRAM so a DRAM row would be a vector. Row size might be 64Kbit or 8K weights if they're 8bit. This also keeps the weights and calcs on the same chip. I'm not sure this would put enough multipliers on one chip though. Systolic arrays can have tens or hundreds of thousands each doing one op per clock cycle.

analog chips could also be very interessting instead of using digital signals and processing them against the weights in the ROM. I have no idea if that scales with such big models though.