Comment by bjourne

1 day ago

Except the native width of Tensor Cores are about 8-32 (depending on scalar type), whereas the width of TPUs is up to 256. The difference in scale is massive.

2 comments

bjourne

fooker 15 hours ago

If it turns out to be useful, Nvidia can't just tweak a parameter in their verilog and declare victory?

If not, what's fundamentally difficult about doing 32 vs 256 here?

saagarjha 18 hours ago

Nobody cares about width; they care about TFLOPs.