Comment by Der_Einzige
15 hours ago
Google did not use TPUs for literally every bit of compute that led to Gemini. GCP has millions of high end Nvidia GPUs and programming for them is an order of magnitude easier, even for googlers.
Any claim from google that all of Gemini (including previous experiments) was trained entirely by TPUs is lies. What they are truthfully saying is that the final training run was done on all TPUs. The market shouldn’t react heavily to this, but instead should react positively to the fact that google is now finally selling TPUs externally and their fab yields are better than expected.
> including all previous experiments
How far back do you go? What about experiments into architecture features that didn’t make the cut? What about pre-transformer attention?
But more generally, why are you so sure that they team that built Gemini didn’t exclusively use TPUs while they were developing it?
I think that one of the reasons that Gemini caught up so quickly is because they have so much compute at fraction of the price of everyone else.
Why should it not react heavily? What’s stopping this from being a start of a trend for google and even Amazon?
They are not lies.
JAX is very easy to use. Give it a try.