← Back to context

Comment by llm_nerd

6 days ago

TPUs are accelerators that accelerate the common operations found in neural nets. A big part is simply a massive number of matrix FMA units to process enormous matrix operations, which comprises the bulk of doing a forward pass through a model. Caching enhancements and massively growing memory was necessary to facilitate transformers, but on the hardware side not a huge amount has changed and the fundamentals from years ago still powers the latest models. The hardware is just getting faster and with more memory and more parallel processing units. And later getting more data types to enable hardware-enabled quantization.

So it isn't like Google designed a TPU for a specific model or architecture. They're pretty general purpose in a narrow field (oxymoron, but you get the point).

The set of operations Google designed into a TPU is very similar to what nvidia did, and it's about as broadly capable. But Google owns the IP and doesn't pay the premium and gets to design for their own specific needs.

There are plenty of matrix multiplies in the backward pass too. Obviously this is less useful when serving but it's useful for training.