← Back to context

Comment by djsjajah

17 hours ago

> including all previous experiments

How far back do you go? What about experiments into architecture features that didn’t make the cut? What about pre-transformer attention?

But more generally, why are you so sure that they team that built Gemini didn’t exclusively use TPUs while they were developing it?

I think that one of the reasons that Gemini caught up so quickly is because they have so much compute at fraction of the price of everyone else.