← Back to context

Comment by HarHarVeryFunny

1 day ago

TPUs do include dedicated hardware, SparseCores, for sparse operations.

https://docs.cloud.google.com/tpu/docs/system-architecture-t...

https://openxla.org/xla/sparsecore

SparseCores appear to be block-sparse as opposed to element-sparse. They use 8- and 16-wide vectors to compute.

Here's another inference-efficient architecture where TPUs are useless: https://arxiv.org/pdf/2210.08277

There is no matrix-vector multiplication. Parameters are estimated using Gumbel-Softmax. TPUs are of no use here.

Inference is done bit-wise and most efficient inference is done after application of boolean logic simplification algorithms (ABC or mockturtle).

In my (not so) humble opinion, TPUs are example case of premature optimization.