Comment by lanthissa

5 months ago

can someone help me understand how the following can be true:

1. TPU's are a serious competitor to nvidia chips.

2. Chip makers with the best chips are valued at 1-3.5T.

3. Google's market cap is 2T.

4. It is correct for google to not sell TPU's.

i have heard the whole, its better to rent them thing, but if they're actually good selling them is almost as good a business as every other part of the company.

Wall street undervalued Google even on day one (IPO). Bezos has said that some of the times the stock had been doing the worst were when the company was doing great.

So, to help you understand how they can be true: market cap is governed by something other than what a business is worth.

As an aside, here's a fun article that embarrasses wall street. [0]

[0] https://www.nbcnews.com/id/wbna15536386

  • I remember sitting around the lunch table in a tech company when Google IPO'd and none of us understood the IPO valuation. I didn't buy any stocks. I also didn't get "cloud" either. Sometimes new business is essentially created out of thin air. Google and Amazon's valuation did not increase only due to their efforts, it also increased because the broader market shifted.

    I guess that means don't take investment advice from me ;) I've done OK buying indices though.

  • The fact that Wall Street sometimes misses revolutions and misvalues stocks does not mean that all perceived misvalued stocks are revolutionary.

    Plenty of companies have screwed up execution, and the market has correctly noticed and penalized them for that.

  • Reading through this news article is hilarious.

    P.S. I did not have access to internet in 2006, so I guess the skepticism was normal at the time.

Selling them and supporting that in the field requires quite some infrastructure you'd have to build. Why go through all that trouble if you already make higher margins renting them out?

Also, if they are so good, it's best to not level the playing field by sharing that with your competitors.

Also "chip makers with the best chips" == Nvidia, there aren't many others. And Alphabet does more than just produce TPUs.

  • Does Google cloud offer them on a "aws outpost" style model? I think that plus cloud access is probably the easiest and ' best ' way to offer them. Last thing you need to be dealing with is super micro, gigabyte etc building a box for them and so on - I can definitely understand not selling the raw chip.

Nvidia is selling a ton of chips on hype.

Google is saving a ton of money by making TPUs, which will pay off in the future when AI is better monetized, but so far no one is directly making a massive profit from foundation models. It's a long term play.

Also, I'd argue Nvidia is massively overvalued.

  • Common in gold rushes but then they are selling chips. Are they overvalued? Maybe. Are they profitable (something WeWork and Uber aren't) ? Yes, quite.

nvidia, who make AI chips with kinda good software support, and who have sales reflecting that, is worth 3.5T

google, who make AI chips with barely-adequate software, is worth 2.0T

AMD, who also make AI chips with barely-adequate software, is worth 0.2T

Google made a few decisions with TPUs that might have made business sense at the time, but with hindsight haven't helped adoption. They closely bound TPUs with their 'TensorFlow 1' framework (which was kinda hard to use) then they released 'TensorFlow 2' which was incompatible enough it was just as easy to switch to PyTorch, which has TPU support in theory but not in practice.

They also decided TPUs would be Google Cloud only. Might make sense, if they need water cooling or they have special power requirements. But it turns out the sort of big corporations that have multi-cloud setups and a workload where a 1.5x improvement in performance-per-dollar is worth pursuing aren't big open source contributors. And understandably, the academics and enthusiasts who are giving their time away for free aren't eager to pay Google for the privilege.

Perhaps Google's market cap already reflects the value of being a second-place AI chipmaker?

  • jax is very much a working (and in my view better, aside from the lack of community) software support. Especially if you use their images (which they do). > > Tensorflow They have been using jax/flax/etc rather than tensorflow for a while now. They don't really use pytorch from what I see on the outside from their research works. For instance, they released siglip/siglip2 with flax linen: https://github.com/google-research/big_vision

    TPUs very much have software support, hence why SSI etc use TPUs.

    P.S. Google gives their tpus for free at: https://sites.research.google/trc/about/, which I've used for the past 6 months now

    • > They have been using jax/flax/etc rather than tensorflow for a while now

      Jax has a harsher learning curve than Pytorch in my experience. Perhaps it's worth it (yay FP!) but it doesn't help adoption.

      > They don't really use pytorch from what I see on the outside from their research works

      Of course not, there is no outside world at Google - if internal tooling exists for a problem their culture effectively mandates using that before anything else, no matter the difference in quality. This basically explains the whole TF1/TF2 debacle which understandably left a poor taste in people's mouths. In any case while they don't use Pytorch, the rest of us very much do.

      > P.S. Google gives their tpus for free at: https://sites.research.google/trc/about/, which I've used for the past 6 months now

      Right and in order to use it effectively you basically have to use Jax. Most researchers don't have the advantage of free compute so they are effectively trying to buy mindshare rather than winning on quality. This is fine, but it's worth repeating as it biases the discussion heavily - many proponents of Jax just so happen to be on TRC or have been given credits for TPU's via some other mechanism.

      3 replies →

Like other Google internal technologies, the amount of custom junk you'd need to support to use a TPU is pretty extreme, and the utility of the thing without the custom junk is questionable. You might as well ask why they aren't marketing their video compression cards.

Ironically, despite Google ultimately being an advertising company, it is the absolute worst company at advertising itself.

Aside from the specifics of Nvidia vs Google, one thing to note regarding company valuations is that not all parts of the company are necessarily additive. As an example (read: a thing I’m making up), consider something like Netflix vs Blockbuster back in the early days - once Blockbuster started to also ship DVDs, you’d think it’d obviously be worth more than Netflix, because they’ve got the entire retail operation as well, but that presumes the retail operation is actually a long-term asset. If Blockbuster has a bunch of financial obligations relating to the retail business (leases, long-term agreements with shippers and suppliers, etc), it can very quickly wind up that the retail business is a substantial drag on Blockbuster’s valuation, as opposed to something that makes it more valuable.

AMD and even people like Huawei also make somewhat acceptable chips but using them is a bit of a nightmare. Is it a similar thing here? Using TPUs is more difficult, only exists inside Google cloud etc

I believe Broadcom is also very involved in the making of the TPU's and networking infrastructure and they are valued at 1.2T currently. Maybe consider the combined value of Broadcom and Google.

If they think they’ve got a competitive advantage vs. GPUs which benefits one of their core products, it would make sense to retain that competitive advantage for the long term, no?

  • No. If they sell the TPUs for “what they’re worth”, they get to reap a portion of the benefit their competitors would get from them. There’s money they could be making that they aren’t.

    Or rather, there would be if TPUs were that good in practice. From the other comments it sounds like TPUs are difficult to use for a lot of workloads, which probably leads to the real explanation: No one wants to use them as much as Google does, so selling them for a premium price as I mentioned above won’t get them many buyers.

> can someone help me understand how the following can be true

You're conflating price with intrinsic value with market analysis. All different things.

Good questions, below I attempt to respond to each point then wrap it up. TLDR: even if TPU is good (and it is good for Google) it wouldn’t be “almost as good a business as every other part of their company” because the value add isn’t FROM Google in the form of a good chip design(TPU). Instead the value add is TO Google in form of specific compute (ergo) that is cheap and fast FROM relatively simple ASICs(TPU chip) stitched together into massively complex systems (TPU super pods).

If interesting in further details:

1) TPUs are a serious competitor to Nvidia chips for Google’s needs, per the article they are not nearly as flexible as a GPU (dependence on precompiled workloads, high usage of PEs in systolic array). Thus for broad ML market usage, they may not be competitive with Nvidia gpu/rack/clusters.

2)chip makers with the best chips are not valued at 1-3.5T, per other comments to OC only Nvidia and Broadcomm are worth this much. These are not just “chip makers”, they are (the best) “system makers” driving designs for chips and interconnect required to go from a diced piece of silicon to a data center consuming MWs. This part is much harder, this is why Google (who design TPU) still has to work with Broadcomm to integrate their solution. Indeed every hyperscalar is designing chips and software for their needs, but every hyperscalar works with companies like Broadcomm or Marvell to actually create a complete competitive system. Side note, Marvell has deals with Amazon, Microsoft and Meta to mostly design these systems they are worth “only” 66B. So, you can’t just design chips to be valuable, you have to design systems. The complete systems have to be the best, wanted by everyone (Nvidia, Broadcomm) in order to be in Ts, otherwise you’re in Bs(Marvell).

4. I see two problems with selling TPU, customers and margins. If you want to sell someone a product, it needs to match their use, currently the use only matches Google’s needs so who are the customers? Maybe you want to capture hyperscalars / big AI labs, their use case is likely similar to google. If so, margins would have to be thin, otherwise they just work directly with Broadcomm/Marvell(and they all do). If Google wants everyone using cuda /Nvidia as a customer then you massively change the purpose of TPU and even Google.

To wrap up, even if TPU is good (and it is good for Google) it wouldn’t be “almost as good a business as every other part of their company” because the value add isn’t FROM Google in the form of a good chip design(TPU). Instead the value add is TO Google in form of specific compute (ergo) that is cheap and fast FROM relatively simple ASICs(TPU chip) stitched together into massively complex systems (TPU super pods).

Sorry that got a bit long winded, hope it’s helpful!

Aren't Google's TPUs a bit like a research project with practical applications as a nice side effect?

  • All of Google ML runs on TPUs tied to $ billions in revenue. You make it sound like TPUs are a Google X startup that's going to get killed tomorrow.

    • What revenue is that? Hardly anyone's paying for Google's AI directly, and it doesn't seem to have dramatically changed their ad business.

      2 replies →

  • Why do you say that? They are on their seventh iteration of hardware and even from the beginning (according to the article) they were designed to serve Google AI needs.

    My take is "sell access to TPUs on Google cloud" is the nice side effect.

  • I think you're thinking of the coral ones. The only ones they've sold directly to the public.