Comment by bastawhiz

17 hours ago

Many labs use TPUs, but not exclusively. Most labs need more compute than they can get, and if there's TPU capacity, they'll adapt their systems to be able to run partially on TPUs.

even google doesnt only use TPUs.

  • Google is in a different position to others in that they're the only frontier lab with a cloud infra business. It obviously makes sense to sell GPUs on cloud infra as people want to rent them. In that respect Google buys a ton of GPUs to rent out.

    What's unclear to me is how much Google uses GPUs for their own stuff. Yes Gemini runs on GPUs now, so that Google can sell Gemini on-prem boxes (recent release announced last week), but is any training or inference for Gemini really happening on GPUs? This is unclear to me. I'd have guessed not given that I thought TPUs were much cheaper to operate, but maybe I'm wrong.

    Caveat, I work at Google, but not on anything to do with this. I'm only going on what's in the press for this stuff.

Why is AMD not more popular then if labs are so flexibly with giving away CUDA?

  • people are trying, especially for inference. For training, it’s just too high risk to tank your training I think.

    TPUs are at least dogfooded by Google deepmind, no team AFAIK has gotten the AMD stack to train well.