TPU transformation: A look back at 10 years of our AI-specialized chips

1 year ago (cloud.google.com)

It's crazy that Google doesn't spin-out their TPU work as a separate company.

TPUs are the second most widely used environment for training after Nvidia. It's the only environment that people build optimized kernels for outside CUDA.

If it was separate to Google then there a bunch of companies who would happily spend some money on a real, working NVidia alternative.

It might be profitable from day one, and it surely would gain substantial market capitalization - Alphabet shareholders should be agitating for this!

  • People constantly bring this point up every 2 weeks here, the cost competitiveness of TPUs for Google comes exactly from the fact they make them in house and don't sell them. They don't need sales channels, support, leads, any of that stuff. They can design for exactly one software stack, one hardware stack, and one set of staff. You cannot just magically spin up a billion-dollar hardware company overnight with software, customers, sales channels and support, etc.

    Nvidia has spent 20 years on this which is why they're good at it.

    > If it was separate to Google then there a bunch of companies who would happily spend some money on a real, working NVidia alternative.

    Unfortunately, most people really don't care about Nvidia alternatives, actually -- they care about price, above all else. People will say they want Nvidia alternatives and support them, then go back to buying Nvidia the moment the price goes down. Which is fine, to be clear, but this is not the outcome people often allude to.

    • You can or at least historically could buy access to TPUs and request it for non-profit projects too through the TPU research programme. Certainly you have been able to pay for pro membership on Notebook to get TPU access, which is how many of the AI generation before ChatGPT learned to run AI. TPUs however were kind of always for training, never geared for inference.

      2 replies →

    • > You cannot just magically spin up a billion-dollar hardware company overnight with software, customers, sales channels and support, etc.

      Not saying it is easy or to do it magically.

      Just noting that Groq (founded by the TPU creator) did exactly this.

      3 replies →

  • The TPUs are highly integrated with the rest of the internal Google ecosystem, both hardware and software. Untangling that would be ... interesting.

    • We have a perfectly reasonable blueprint for an ML accelerator that isn't tied into the google ecosystem: nvidia's entire product line.

      Between that and the fact Google already sells "Coral Edge TPUs" [1] I'd think they could manage to untangle things.

      Whether the employees would want to be spun off or not is a different matter, of course...

      [1] https://coral.ai/products/

      3 replies →

    • Knowing what I know about big corporations, the biggest entanglement is going to be IP ownership, political constraints and promises to shareholders.

  • There would probably a huge demand, but would Google be able to satisfy it? Is it currently able to satisfy its own demand?

    • That would be the point of spinning it out. They could have an IPO, raise as much capital as there is in the observable Universe, and build enough fabs to satisfy all the demand.

      15 replies →

  • > It's crazy that Google doesn't spin-out their TPU work as a separate company.

    Not really. Google TPUs require google's specific infrastructure, and cannot be deployed out side the Google Datacenter. The software is google specific, the monetization model is google specific.

    We also have no idea how profitable TPUs would actually be if a separate company. The only customer of TPUs is Google and Google Cloud.

Impressive: “Overall, more than 60% of funded generative AI startups and nearly 90% of gen AI unicorns use Google Cloud’s AI infrastructure, including Cloud TPUs.”

  • Doesn’t Google Cloud’s AI infrastructure include Colab? That’s useful for so many things

  • Google will also offer GCP credits for Free Nvidia GPUs with almost no questions asked.

    AWS and Azure (to a lesser extent) can also make this argument.

  • Use does not mean heavily rely on. If an AI Startup uses google colab or runs 1 POC with TPUs, then they would fall under this stat.

Apple Intelligence uses Google TPUs instead of GPUs.

  • That's something not surprising, given JG and Ruoming's Google stint.

    Google is going to dominate LLM ushered AI era. Google has been AI first since 2016, they just don't have the opening. Sam, as inapt at engineering, just has no idea how to navigate the delicate biz & eng competitions.

  • For training, yes, but no indications on inference workloads. Apple has said they would use their own silicon for inference in the cloud.

    • Plus the Apple "Neural Engine" which has shipped on millions of iPhones for local inference.

The real winner here is the marketing department who manage to make this article a "celebration of successes" when in fact we know the TPU is yet one more of those biggest failures of Google to have the lead by a mile and then.. squander it. And no, "it's on our cloud and Pixel phones" doesn't cut it at this level.

  • I have a strong suspicion that previous generations of TPU were not cost effective for decent AI, explaining Google's reluctance to release complex models. They have had superior translation for years, for example. But scaling it up to the world population? Not possible with TPUs.

    It was OpenAI that showed you can actually deploy a large model, like GPT-4, to a large audience. Maybe Google didn't reach the cost efficiency with just internal use that NVIDIA does.

    • Google used to have superior translation but that hasn't been the case for years now. Based on my experience DeepL (https://www.deepl.com/) is vastly superior, especially for even slightly more niche languages. I'm a native Finnish speaker and I regularly use DeepL to translate Finnish into English in cases where I don't want to do it by hand, and the quality is just way beyond anything Google can do. I've had similar experiences with languages I'm less proficient with but still do understand to an extent, such as French or German

    • I suspect it had much more to do with lacking product market fit. They spent 10 years faking demos and dreaming about what they thought AI could do eventually but since it never worked the products never released and so they never expanded. A well optimized TPU will always beat a well optimized GPU on efficiency.

      1 reply →