← Back to context

Comment by yvdriess

6 days ago

What do you mean by work better here? If it's for better accuracy then no they are not better at the same weight dimensions.

The big thing is that sparse models allow you to train models with significantly larger dimensionality, blowing up the dimensions several orders of magnitudes. More dimensions leading to better results does not seem to be under a lot of contention, the open questions are more about quantifying that. It's simply not shown experimentally because the hardware is not there to train it.

The big thing is that sparse models allow you to train models with significantly larger dimensionality, blowing up the dimensions several orders of magnitudes.

Do you have any evidence to support this statement? Or are you imagining some not yet invented algorithms running on some not yet invented hardware?

  • Sparse matrices can increase in dimension while keeping the same number of non-zeroes, that part is self evident. Sparse weights models can be trained, you probably are already aware of RigL and SRigL, there is similar other related work on unstructured and structured sparse training. You could argue that those adapt their algorithm to be executable on GPUs and that none are training at x100 or x1000 dimensions. Yes, that is the part that requires access to sparse compute hardware acceleration, which exists as prototypes [1] or are extremely expensive (Cerebras).

    [1] https://dl.acm.org/doi/10.1109/MM.2023.3295848

    • Unstructured sparsity cannot be implemented in hardware efficiently if you still want to do matrix multiplication. If you don’t want to do matrix multiplication you first need to come up with new algorithms, tested in software. This reminds me of what Numenta tried to do with their SDRs - note they didn’t quite succeed.

      1 reply →