Comment by yvdriess

5 days ago

> Unstructured sparsity cannot be implemented in hardware efficiently if you still want to do matrix multiplication.

Hard disagree. It certainly is a magnitude harder to design hardware for sp x sp MM, yes; it requires a paradigm shift to do sparse compute efficiently, but there are hardware architectures both in research and commercially available that do it efficiently. The same kind of architecture is needed to scale op graph compute. You see solutions at the smaller scale in FPGA and reconfigurable/dataflow accelerators, larger scale in Intel's PIUMA and Cerebras. I've been involved in co-design work of Graphblas on the software side and one of the aforementioned hardware platforms: the main issue with developing SpMSpM hardware lies more with the necessary capital and engineering investments being prioritized to current frontier AI model accelerators, not because of lack of proven results.