Comment by thesz

21 hours ago

SparseCores appear to be block-sparse as opposed to element-sparse. They use 8- and 16-wide vectors to compute.

Here's another inference-efficient architecture where TPUs are useless: https://arxiv.org/pdf/2210.08277

There is no matrix-vector multiplication. Parameters are estimated using Gumbel-Softmax. TPUs are of no use here.

Inference is done bit-wise and most efficient inference is done after application of boolean logic simplification algorithms (ABC or mockturtle).

In my (not so) humble opinion, TPUs are example case of premature optimization.