Comment by whimsicalism
6 months ago
> The TPU is Google's custom ASIC for accelerating the inference phase of neural network computations.
this seems hopelessly out of date/confused
6 months ago
> The TPU is Google's custom ASIC for accelerating the inference phase of neural network computations.
this seems hopelessly out of date/confused
They're not confused at all, this is just a (correct) description of TPU v1. The repository is 8 years old.
Additional text from Google's 2017 paper abstract says:
what's the memory bandwidth? IIRC that is the limiting factor in LLM hardware today
Slide 21, https://files.futurememorystorage.com/proceedings/2024/20240...
hence the out of date part of my comment
Recent (2024) description by Google, https://cloud.google.com/blog/transform/ai-specialized-chips...
How would you describe it instead? Curious and learning
Google does everything, both inference and training, on their TPUs.
Inference is easier, since the person deploying a model knows the architecture ahead of time and therefore can write custom code for their particular model.
When training you want to be as flexible as possible. The framework and hardware should not impose any particular architecture. This means lots of kernels and combinations of kernels. Miss one and you're out.
> Miss one and you're out.
well these days since everything is transformer, your pool of choices is less daunting and theres only about four or five places that someone might get clever.