Comment by fooker

2 months ago

That's exactly what Nvidia is doing with tensor cores.

Except the native width of Tensor Cores are about 8-32 (depending on scalar type), whereas the width of TPUs is up to 256. The difference in scale is massive.