Comment by thesz
21 hours ago
SparseCores appear to be block-sparse as opposed to element-sparse. They use 8- and 16-wide vectors to compute.
Here's another inference-efficient architecture where TPUs are useless: https://arxiv.org/pdf/2210.08277
There is no matrix-vector multiplication. Parameters are estimated using Gumbel-Softmax. TPUs are of no use here.
Inference is done bit-wise and most efficient inference is done after application of boolean logic simplification algorithms (ABC or mockturtle).
In my (not so) humble opinion, TPUs are example case of premature optimization.
They are on their 7th generation now, so presumably the architecture is being updated as needs require.