Google's real moat isn't the TPU silicon itself—it's not about cooling, individual performance, or hyper-specialization—but rather the massive parallel scale enabled by their OCS interconnects.
To quote The Next Platform: "An Ironwood cluster linked with Google’s absolutely unique optical circuit switch interconnect can bring to bear 9,216 Ironwood TPUs with a combined 1.77 PB of HBM memory... This makes a rackscale Nvidia system based on 144 “Blackwell” GPU chiplets with an aggregate of 20.7 TB of HBM memory look like a joke."
Nvidia may have the superior architecture at the single-chip level, but for large-scale distributed training (and inference) they currently have nothing that rivals Google's optical switching scalability.
Also, Google owns the entire vertical stack, which is what most people need. It can provide an entire spectrum of AI services far cheaper, at scale (and still profitable) via its cloud. Not every company needs to buy the hardware and build models, etc., etc.; what most companies need is an app store of AI offerings they can leverage. Google can offer this with a healthy profit margin, while others will eventually run out of money.
They just need to actually make and market a good product though, and they seem to really struggle with this. Maybe on a long enough timeline their advantages will make this one inevitable.
That is comparing an all to all switched Nvlink fabric to a 3D torus for TPUs. Those are completely different network topologies with different tradeoffs.
For example the currently very popular Mixture of Experts architectures require a lot of all to all traffic (for expert parallelism) which works a lot better on the switched NVlink fabric as opposed where it doesn't need to traverse multiple links in the torus.
This is an underrated point. Comparing just the peak bandwidth is like saying Bulldozer was the far superior CPU of the era because it had a really high frequency ceiling.
Really? Fully-connected hardware is in buildable (at scale) which we already know from the HPC world. Fat trees and dragonfly networks are pretty scalable, but a 3d torus is a very good tradeofff, and respects the dimensionality of reality.
Bisection bandwidth is a useful metric, but is hop count? Per-hop cost tends to be pretty small.
NVFP4 is the thing no one saw coming. I wasn't watching the MX process really, so I cast no judgements, but it's exactly what it sounds like, a serious compromise in resource constrained settings. And it's in the silicon pipeline.
NVFP4 is to put it mildly a masterpiece, the UTF-8 of its domain and in strikingly similar ways it is 1. general 2. robust to gross misuse 3. not optional if success and cost both matter.
It's not a gap that can be closed by a process node or an architecture tweak: it's an order of magnitude where the polynomials that were killing you on the way up are now working for you.
sm_120 (what NVIDIA's quiet repos call CTA1) consumer gear does softmax attention and projection/MLP blockscaled GEMM at a bit over a petaflop at 300W and close to two (dense) at 600W.
This changes the whole game and it's not clear anyone outside the lab even knows the new equilibrium points, it's nothing like Flash3 on Hopper, lotta stuff looks FLOPs bound, GDDR7 looks like a better deal than HBMe3. The DGX Spark is in no way deficient, it has ample memory bandwidth.
This has been in the pipe for something like five years and even if everyone else started at the beginning of the year when this was knowable, it would still be 12-18 months until tape out. And they haven't started.
Years Until Anyone Can Compete With NVIDIA is back up to the 2-5 it was 2-5 years ago.
This was supposed to be the year ROCm and the new Intel stuff became viable.
This reads like a badly done, sponsored hype video on YouTube.
So if we look at what NVIDIA has to say about NVFP4 it sure sounds impressive [1]. But look closely that initial graph never compares fp8 and fp4 on the same hardware. They jump from H100 to B200 while implying a 5x jump of going with fp4 which it isn't. Accompanied with scary words like if you use MXFP4 "Risk of noticeable accuracy drop compared to FP8" .
Contrast that with what AMD has to say on the open MXFP4 approach which is quite similar to NVFP4 [2]. Ohh the horrors of getting 79.6 instead of 79.9 on GPQA Diamond when using MXFP4 instead of FP8.
For all the excitement surrounding this, I fail to comprehend how Google can't even meet the current demand for Gemini 3^. Moreover, they are unwilling to invest in expansion directly (apparently have a mandate to double their compute every 6 months without spending more than their current budget). So, pardon me if I can't see how they will scale operations as demand grows while simultaneously selling their chips to competitors?! This situation doesn't make any sense.
^Even now I get capacity related error messages, so many days after the Gemini 3 launch. Also, Jules is basically unusable. Maybe Gemini 3 is a bigger resource hog than anyone outside of Google realizes.
I also suspect Google is launching models it can’t really sustain in volume or that are operating at a loss. Nothing preventing them from like doubling model size compared to the rest or allocating an insane amount of compute just to make the headlines on model performance (clearly it’s good for the stock). These things are opaque anyway, buried deep into the P&L.
Not vibes. TPUs have fallen behind or had to be redesigned from scratch many times as neural architectures and workloads evolved, whereas the more general purpose GPUs kept on trucking and building on their prior investments. There's a good reason so much research is done on Nvidia clusters and not TPU clusters. TPU has often turned out to be over-specialized and Nvidia are pointing that out.
The tweet gives their justification; CUDA isn't ASIC. Nvidia GPUs were popular for crypto mining, protein folding, and now AI inference too. TPUs are tensor ASICs.
FWIW I'm inclined to agree with Nvidia here. Scaling up a systolic array is impressive but nothing new.
OCS is indeed an engineering marvel, but look at NVIDIA's NVL72. They took a different path: instead of flexible optics, they used the brute force of copper, turning an entire rack into one giant GPU with unified memory. Google is solving the scale-out problem, while NVIDIA is solving the scale-up problem. For LLM training tasks, where communication is the bottleneck, NVIDIA's approach with NVLink might actually prove even more efficient than Google's optical routing.
No, not at all. If this were true Google would be killing it in MLPerf benchmarks, but they are not.
It’s better to have a faster, smaller network for model parallelism and a larger, slower one for data parallelism than a very large, but slower, network for everything. This is why NVIDIA wins.
Check the specs again. Per chip, TPU 7x has 192GB of HBM3e, whereas the NVIDIA B200 has 186GB.
While the B200 wins on raw FP8 throughput (~9000 vs 4614 TFLOPs), that makes sense given NVIDIA has optimized for the single-chip game for over 20 years. But the bottleneck here isn't the chip—it's the domain size.
NVIDIA's top-tier NVL72 tops out at an NVLink domain of 72 Blackwell GPUs. Meanwhile, Google is connecting 9216 chips at 9.6Tbps to deliver nearly 43 ExaFlops. NVIDIA has the ecosystem (CUDA, community, etc.), but until they can match that interconnect scale, they simply don't compete in this weight class.
I always enjoy being wrong and I was very wrong in my predictions about Google : I thought they should theoretically win, but I was also very confident they couldn't possibly turn their execution ship around to actually pull together a coherent competitor to OpenAI. But they do seem to have done that and it's very impressive. If they do continue to execute, I can't see anybody stopping them dominating and I would be bearish on nearly every other player catching them.
The biggest problem though is trust, and I'm still holding back from letting anyone under my authority in my org use Gemini because of the lack of any clear or reasonable statement or guidelines on how they use your data. I think it won't matter in the end if they execute their way to domination - but it's going to give everyone else a chance at least for a while.
The LLM provider I trust the most right now is AWS. Anybody else seems to have very conflicted purposes when it comes to sending them my data and interactions.
You're not wrong... but any space where Amazon, of all companies, has a shot at being the "most trustworthy player" is one I'm going to avoid where I can.
It doesn’t help when their thankgiving doodle that sends me to Gemini on how to plan making thanksgiving dinner on time completely fails in ridiculous ways
This feels a lot like the RISC/CISC debate. More academic than it seems. Nvidia is designing their GPUs primarily to do exactly the same tasks TPUs are doing right now. Even within Google it's probably hard to tell whether or not it matters on a 5-year timeframe. It certainly gives Google an edge on some things, but in the fullness of time "GPUs" like the H100 are primarily used for running tensor models and they're going to have hardware that is ruthlessly optimized for that purpose.
And outside of Google this is a very academic debate. Any efficiency gains over GPUs will primarily turn into profit for Google rather than benefit for me as a developer or user of AI systems. Since Google doesn't sell TPUs, they are extremely well-positioned to ensure no one else can profit from any advantages created by TPUs.
> Since Google doesn't sell TPUs, they are extremely well-positioned to ensure no one else can profit from any advantages created by TPUs.
First part is true at the moment, not sure the second follows. Microsoft is developing their own “Maia” chips for running AI on Azure with custom hardware, and everyone else is also getting in the game of hardware accelerators. Google is certainly ahead of the curve in making full-stack hardware that’s very very specialized for machine learning. But everyone else is moving in the same direction: lots of action is in buying up other companies that make interconnects and fancy networking equipment, and AMD/NVIDIA continue to hyper specialize their data center chips for neural networks.
Google is in a great position, for sure. But I don’t see how they can stop other players from converging on similar solutions.
I feel like this is more like the console/PC debate in the 90s. Consoles like the SNES had dedicated fixed function graphics hardware with weaker general specs, but with the special HW they could perform as well as a much more expensive PC - but as devs made more and more varied and clever games, that fixed function hardware couldn't support it and the PC became the superior choice.
Weird they'd do this after developing several generations of their own inference chip. Google is basically a competitor. This may just be a ploy to get better pricing from Nvidia.
> It is also important to note that, until recently, the GenAI industry’s focus has largely been on training workloads. In training workloads, CUDA is very important, but when it comes to inference, even reasoning inference, CUDA is not that important, so the chances of expanding the TPU footprint in inference are much higher than those in training (although TPUs do really well in training as well – Gemini 3 the prime example).
Does anyone have a sense of why CUDA is more important for training than inference?
NVIDIA chips are more versatile. During training, you might need to schedule things to the SFU(Special Function unit that does sin, cos, 1/sqrt(x), etc), you might need to run epilogues, save intermediary computations, save gradients, etc. When you train, you might need to collect data from various GPUs, so you need to support interconnects, remote SMEM writing, etc.
Once you have trained, you have frozen weights/feed-forward networks that consist out of frozen weights that you can just program in and run data over. These weights can be duplicated across any amount of devices and just sit there and run inference with new data.
If this turns out to be the future use-case for NNs(it is today), then Google are better set.
Won't the need to train increase as the need for specialized, smaller models increases and we need to train their many variations? Also what about models that continuously learn/(re)train? Seems to me the need for training will only go up in the future.
This is a very important point - the market for training chips might be a bubble, but the market for inference is much, much larger. At some point we might have good enough models and the need for new frontier models will cool down. The big power-hungry datacenters we are seeing are mostly geared towards training, while inference-only systems are much simpler and power efficient.
A real shame, BTW, all that silicon doesn't do FP32 (very well). After training ceases to be that needed, we could use all that number crunching for climate models and weather prediction.
it's already the case that people are eeking out most further gains through layering "reasoning" on top of what existing models can do - in other words, using massive amounts of inference to substitute for increases model performance. Whereever things plateau I expect this will still be the case - so inference ultimately will always be the end game market.
Training is taking an enormous problem and trying to break it into lots of pieces and managing the data dependency between those pieces. It's solving 1 really hard problem. Inference is the opposite, it's lots of small independent problems. All of this "we have X many widgets connected to Y many high bandwidth optical telescopes" is all a training problem that they need to solve. Inference is "I have 20 tokens and I want to throw them at these 5,000,000 matrix multiplies, oh and I don't care about latency".
CUDA is just a better dev experience. Lots of training is experiments where developer/researcher productivity matters. Googlers get to use what they're given, others get to choose.
Once you settle on a design then doing ASICs to accelerate it might make sense. But I'm not sure the gap is so big, the article says some things that aren't really true of datacenter GPUs (Nvidia dc gpus haven't wasted hardware on graphics related stuff for years).
I think it’s the same reason windows is inportant to desktop computers. Software was written to depend on it. Same with most of the software out there today to train being built around CUDA. Even a version difference of CUDA can break things.
It's just more common as a legacy artifact from when nvidia was basically the only option available. Many shops are designing models and functions, and then training and iterating on nvidia hardware, but once you have a trained model it's largely fungible. See how Anthropic moved their models from nvidia hardware to Inferentia to XLA on Google TPUs.
Further it's worth noting that the Ironwood, Google's v7 TPU, supports only up to BF16 (a 16-bit floating point that has the range of FP32 minus the precision. Many training processes rely upon larger types, quantizing later, so this breaks a lot of assumptions. Yet Google surprised and actually training Gemini 3 with just that type, so I think a lot of people are reconsidering assumptions.
This is not the case for LLMs. FP16/BF16 training precision is standard, with FP8 inference very common. But labs are moving to FP8 training and even FP4.
When training a neural network, you usually play around with the architecture and need as much flexibility as possible. You need to support a large set of operations.
Another factor is that training is always done with batches. Inference batching depends on the number of concurrent users. This means training tends to be compute bound where supporting the latest data types is critical, whereas inference speeds are often bottlenecked by memory which does not lend itself to product differentiation. If you put the same memory into your chip as your competitor, the difference is going to be way smaller.
That quote left me with the same question. Something about decent amount of ram on one board perhaps? That’s advantageous for training but less so for inference?
inference is often a static, bounded problem solvable by generic compilers. training requires the mature ecosystem and numerical stability of cuda to handle mixed-precision operations. unless you rewrite the software from the ground up like Google but for most companies it's cheaper and faster to buy NVIDIA hardware
I don't think what the article writes about matters all that much. Gemini 3 Pro is arguably not even the best model anymore, and it's _weeks_ old, and Google has far more resources than Anthropic does. If the hardware actually was the secret sauce, Google would be wiping the floor with little everyone else.
But they're not.
There's a few confounding problems:
1. Actually using that hardware effectively isn't easy. It's not as simple as jacking up some constant values and reaping the benefits. Actually using the hardware is hard, and by the time you've optimized for it, you're already working on the next model.
2. This is a problem that, if you're not Google, you can just spend your way out of. A model doesn't take a petabyte of memory to train or run. Regular old H100s still mostly work fine. Faster models are nice, but Gemini 3 Pro being 50% of the latency as Opus 4.5 or GPT 5.1 doesn't add enough value to matter to really anyone.
3. There's still a lot of clever tricks that work as low hanging fruit to improve almost everything about ML models. You can make stuff remarkably good with novel research without building your own chips.
4. A surprising amount of ML model development is boots on the ground work. Doing evals. Curating datasets. Tweaking system prompts. Having your own Dyson sphere doesn't obviate a lot of the typing and staring at a screen that necessarily has to be done to make a model half decent.
5. Fancy bespoke hardware means fancy bespoke failure modes. You can search stack overflow for CUDA problems, you can't just Bing your way to victory when your fancy TPU cluster isn't doing the thing you want it to do.
I think you are addressing the issue from a developer's perspective. I don't think TPUs are going to be sold to individual users anytime soon. What the article is pointing out is that Google is now able to squeeze significantly more performance per dollar than their peer competitors in the LLM space.
For example, OpenAI has announced trillion-dollar investments in data centers to continue scaling. They need to go through a middle-man (Nvidia), while Google does not, and will be able to use their investment much more efficiently to train and serve their own future models.
> Google is now able to squeeze significantly more performance per dollar than their peer competitors in the LLM space
Performance per dollar doesn't "win" anything though. Performance (as in speed) hardly cracks the top five concerns that most folks have when choosing a model provider, because fast, good models already exist at price points that are acceptable. That might mean slightly better margins for Google, but ultimately isn't going to make them "win"
Google owns 14% of Anthropic and Anthropic is using Google TPUs, as well as AWS Trainium and of course GPUs. It isn't necessary for one company to create both the winning hardware and the winning software to be part of the solution. In fact with the close race in software hardware seems like the better bet.
But price per token isn't even a directly important concern anymore. Anyone with a brain would pay 5x more per token for a model that uses 10x fewer tokens with the same accuracy. I've gone all in on Opus 4.5 because even though it's more expensive, it solves the problems I care about with far fewer tokens.
Slightly more seriously: what you say makes sense if and only if you're projecting Sam Altman and assuming that a) real legit superhuman AGI is just around the corner, and b) all the spoils will accrue to the first company that finds it, which means you need to be 100% in on building the next model that will finally unlock AGI.
But if this is not the case -- and it's increasingly looking like it's not -- it's going to continue to be a race of competing AIs, and that race will be won by the company that can deliver AI at scale the most cheaply. And the article is arguing that company will be Google.
I think you are missing the point. They are saying "weeks old" isn't very old.
> it's going to continue to be a race of competing AIs, and that race will be won by the company that can deliver AI at scale the most cheaply.
I don't see how that follows at all. Quality and distribution both matter a lot here.
Google has some advantages but some disadvantages here too.
If you are on AWS GovCloud, Anthropic is right there. Same on Azure, and on Oracle.
I believe Gemini will be available on the Oracle Cloud at some point (it has been announced) but they are still behind in the enterprise distribution race.
OpenAI is only available on Azure, although I believe their new contract lets them strike deals elsewhere.
On the consumer side, OpenAI and Google are well ahead of course.
Last week it looked like Google had won (hence the blog post) but now almost nobody is talking about antigravity and Gemini 3 anymore so yeah what op says is relevant
It definitely depends on how you're measuring. But the benchmarks don't put it at the top for many ways of measuring, and my own experience doesn't put it at the top. I'm glad if it works for you, but it's not even a month old and there are lots of folks like me who see it as definitely worse for classes of problems that 3 Pro could be the best at.
Which is to say, if Google was set up to win, it shouldn't even be a question that 3 Pro is the best. It should be obvious. But it's definitely not obvious that it's the best, and many benchmarks don't support it as being the best.
On point 5, I think this is the real moat for CUDA. Does Google have tools to optimize kernels on their TPUs? Do they have tools to optimize successive kernel launches on their TPUs? How easy is it to debug on a TPU(arguably CUDA could use work here but still...)? Does Google help me fully utilize their TPUs? Can I warm up a model on a TPU, checkpoint it, and launch the checkpoints to save time?
I am fairly pro-google(they invented the LLM, FFS...) and recognize the advantages(price/token, efficiency, vertical integration, established DCs w/ power allocations) but also know they have a habit of slightly sucking at everything but search.
A question I don't see addressed in all these articles: what prevents Nvidia from doing the same thing and iterating on their more general-purpose GPU towards a more focused TPU-like chip as well, if that turns out to be what the market really wants.
The big difference is that Google is both the chip designer *and* the AI company. So they get both sets of profits.
Both Google and Nvidia contract TSMC for chips. Then Nvidia sells them at a huge profit. Then OpenAI (for example) buys them at that inflated rate and them puts them into production.
So while Nvidia is "selling shovels", Google is making their own shovels and has their own mines.
on top of that Google is also cloud infrastructure provider - contrary to OpenAI that need to have someone like Azure plug those GPUs and host servers.
The own shovels for own mines strategy has a hidden downside: isolation. NVIDIA sells shovels to everyone - OpenAI, Meta, xAI, Microsoft - and gets feedback from the entire market. They see where the industry is heading faster than Google, which is stewing in its own juices. While Google optimizes TPUs for current Google tasks (Gemini, Search), NVIDIA optimizes GPUs for all possible future tasks. In an era of rapid change, the market's hive mind usually beats closed vertical integration.
So when the bubble pops the companies making the shovels (TSMC, NVIDIA) might still have the money they got for their products and some of the ex-AI companies might least be able to sell standard compliant GPUs on the wider market.
And Google will end up with lots of useless super specialized custom hardware.
Selling shovels may still turn out to be the right move: Nvidia got rich off the cryptocurrency bubble, now they're getting even richer off the AI bubble.
Having your own mines only pays off if you actually do strike gold. So far AI undercuts Google's profitable search ads, and loses money for OpenAI.
Deepmind gets to work directly with the TPU team to make custom modifications and designs specifically for deepmind projects. They get to make pickaxes that are made exactly for the mine they are working.
Everyone using Nvidia hardware has a lot of overlap in requirements, but they also all have enough architectural differences that they won't be able to match Google.
OpenAI announced they will be designing their own chips, exactly for this reason, but that also becomes another extremely capital intensive investment for them.
This also doesn't get into that Google also already has S-tier dataceters and datacenter construction/management capabilities.
Isn’t there a suspicion that OpenAI buying custom chips from another Sam Altman venture is just graft? Wasn’t that one of the things that came up when the board tried to out him?
> Deepmind gets to work directly with the TPU team to make custom modifications
You don't think Nvidia has field-service engineers and applications engineers with their big customers? Come on man. There is quite a bit of dialogue between the big players and the chipmaker.
It's not that the TPU is better than an NVidia GPU, it's just that it's cheaper since it doesn't have a fat NVidia markup applied, and is also better vertically integrated since it was designed/specified by Google for Google.
TPUs are also cheaper because GPUs need to be more general purpose whereas TPUs are designed with a focus on LLM workloads meaning there's not wasted silicon. Nothing's there that doesn't need to be there. The potential downside would be if a significantly different architecture arises that would be difficult for TPUs to handle and easier for GPUs (given their more general purpose). But even then Google could probably pivot fairly quickly to a different TPU design.
Except the native width of Tensor Cores are about 8-32 (depending on scalar type), whereas the width of TPUs is up to 256. The difference in scale is massive.
That's pretty much what they've been doing incrementally with the data center line of GPUs versus GeForce since 2017. Currently, the data center GPUs now have up to 6 times the performance at matrix math of the GeForce chips and much more memory. Nvidia has managed to stay one tape out away from addressing any competitors so far.
The real challenge is getting the TPU to do more general purpose computation. But that doesn't make for as good a story. And the point about Google arbitrarily raising the prices as soon as they think they have the upper hand is good old fashioned capitalism in action.
Nvidia doesn't have the software stack to do a TPU.
They could make a systolic array TPU and software, perhaps. But it would mean abandoning 18 years of CUDA.
The top post right now is talking about TPU's colossal advantage in scaling & throughput. Ironwood is massively bigger & faster than what Nvidia is shooting for, already. And that's a huge advantage. But imo that is a replicateable win. Throw gobs more at networking and scaling and nvidia could do similar with their architecture.
The architectural win of what TPU is more interesting. Google sort of has a working super powerful Connection Machine CM-1. The systolic array is a lot of (semi-)independent machines that communicate with nearby chips. There's incredible work going on to figure out how to map problems onto these arrays.
Where-as on a GPU, main memory is used to transfer intermediary results. It doesn't really matter who picks up work, there's lots of worklets with equal access time to that bit of main memory. The actual situation is a little more nuanced (even in consumer gpu's there's really multiple different main memories, which creates some locality), but there's much less need for data locality in the GPU, and much much much much tighter needs, the whole premise of the TPU is to exploit data locality. Because sending data to a neighbor is cheap, sending storing and retrieving data from memory is slower and much more energy intense.
CUDA takes advantage of, relies strongly on the GPU's reliance in main memory being (somewhat) globally accessible. There's plenty of workloads folks do in CUDA that would never work on TPU, on these much more specialized data-passing systolic arrays. That's why TPUs are so amazing, because they are much more constrained devices, that require so much more careful workload planning, to get the work to flow across the 2D array of the chip.
Google's work on projects like XLA and IREE is a wonderful & glorious general pursuit of how to map these big crazy machine learning pipelines down onto specific hardware. Nvidia could make their own or join forces here. And perhaps they will. But the CUDA moat would have to be left behind.
For sure, I did not mean to imply they could do it quickly or easily, but I have to assume that internally at Nvidia there's already work happening to figure out "can we make chips that are better for AI and cheaper/easier to make than GPUs?"
It’s not binary. It’s not existential. What’s at stake for Nvidia is its HUGE profit margins. 5 years from now, Nvidia could be selling 100x as many chips. But its market cap could be a fraction of what it is now if competition is so intense that its making 5% profit margin instead of 90%.
My personal guess would be what drives the cost and size of these chips is the memory bandwidth and the transcievers required to support it. Since transcievers/memory controllers are on the edge of the chip, you get a certain minimum circumference for a given bandwidth, which determines your min surface area.
It might be even 'free' to fill it with more complicated logic (especially one that allows you write clever algorithms that let you save on bandwidth).
Nothing in principle.
But Huang probably doesn't believe in hyper specializing their chips at this stage because it's unlikely that the compute demands of 2035 are something we can predict today.
For a counterpoint, Jim Keller took Tenstorrent in the opposite direction. Their chips are also very efficient, but even more general purpose than NVIDIA chips.
How is Tenstorrent h/w more general purpose than NVIDIA chips? TT hardware is only good for matmuls and some elementwise operations, and plain sucks for anything else. Their software is abysmal.
For users buying H200s for AI workloads, the "ASIC" tensor cores deliver the overwhelming bulk of performance. So they already do this, and have been since Volta in 2017.
To put it into perspective, the tensor cores deliver about 2,000 TFLOPs of FP8, and half that for FP16, and this is all tensor FMA/MAC (comprising the bulk of compute for AI workloads). The CUDA cores -- the rest of the GPU -- deliver more in the 70 TFLOP range.
So if data centres are buying nvidia hardware for AI, they already are buying focused TPU chips that almost incidentally have some other hardware that can do some other stuff.
I mean, GPUs still have a lot of non-tensor general uses in the sciences, finance, etc, and TPUs don't touch that, but yes a lot of nvidia GPUs are being sold as a focused TPU-like chip.
Is it the Cuda cores that run the vertex/fragment/etc shaders in normal GPUs? Where does the ray tracing units fit in? How much of a modern Nvidia GPU is general purpose vs specialized to graphics pipelines?
> what prevents Nvidia from doing the same thing and iterating on their more general-purpose GPU towards a more focused TPU-like chip as well, if that turns out to be what the market really wants.
Nothing prevents them per se, but it would risk cannibalising their highly profitable (IIRC 50% margin) higher end cards.
- ASIC won the crypto mining battle in the past, it's orders of magnitude faster
- Google is not owning the technology but builds a cohesive cloud around it, Tesla, Meta work on their own asic ai chips and I guess others
- A signal is already given: Softbank sold it's entire Nvidia stock and berkshire added google on their portfolio.
Microsoft "has" a lot of companies data, and google is probably building the most advanced ai cloud.
However, I can't think they had a cloud which was light-years ahead of aws 15 years ago and now GCP is no 3, they also released opensource gpt models more than 5 years ago that constituted the foundation for openai closed sourced models.
If Google won, it would cannibalize its current ad-driven business and replace it with something that is extremely expensive to run and difficult to make profit from. A Pyrrhic win essentially.
But that would happen regardless of who won, better to at least dominate the new paradigm and figure out how to extract value from it. I also suspect that once the value generation is figured out they will cease offering these APIs to anyone, if you had a golden goose would you rent it?
I mean, focus is a thing that Google has always struggled with. But I kind of doubt that customers who need online marketing (ads) are going to convert en masse to users who rent cloud TPUs instead.
I have read in the past that ASICs for LLMs are not as simple a solution compared to cryptocurrency. In order to design and build the ASIC you need to commit to a specific architecture: a hashing algorithm for a cryptocurrency is fixed but the LLMs are always changing.
Am I misunderstanding "TPU" in the context of the article?
LLMs require memory and interconnect bandwidth so needs a whole package that is capable of feeding data to the compute. Crypto is 100% compute bound. Crypto is a trivially parallelized application that runs the same calculation over N inputs.
Regardless of architecture (which is anyways basically the same for all LLMs), the computational needs of modern neural networks are pretty generic, centered around things like matrix multiply, which is what the TPU provides. There is even TPU support for some operations built into PyTorch - it is not just a proprietary interface that Google use themselves.
"Application-specific" doesn't necessarily mean unprogrammable. Bitcoin miners aren't programmable because they don't need to be. TPUs are ASICs for ML and need to be programmable so they can run different models. In theory, you could make an ASIC hardcoded for a specific model, but given how fast models evolve, it probably wouldn't make much economic sense.
It’s true that architectures change, but they are built from common components. The most important of those is matrix multiplication, using a relatively small set of floating point data types. A device that accelerates those operations is, effectively, an ASIC for LLMs.
Cryptocurrency architectures also change - Bitcoin is just about the lone holdout that never evolves. The hashing algorithm for Monero is designed so that a Monero hashing ASIC is literally just a CPU, and it doesn't even matter what the instruction set is.
The funniest thing about this story is that NVIDIA has essentially become a TPU company. Look at the Hopper and Blackwell architectures: Tensor Cores are taking up more space, the Transformer Engine has appeared, and NVLink has started to look like a supercomputer interconnect. Jensen Huang isn't stupid. He saw the threat of specialized ASICs and just built the ASIC inside the GPU. Now we have a GPU that is 80% matrix multiplier but still keeps CUDA compatibility. Google tried to kill the GPU, but instead forced the GPU to mutate into a TPU
The part that surprised me is how much TPUs gain from the systolic array design. It basically cuts down the constant memory shuffling that GPUs have to do, so more of the chip’s time is spent actually computing.
The downside is the same thing that makes them fast: they’re very specialized. If your code already fits the TPU stack (JAX/TensorFlow), you get great performance per dollar. If not, the ecosystem gap and fear of lock-in make GPUs the safer default.
Given the importance of scale for this particular product, any company placing itself on "just" one layer of the whole story is at a heavy disadvantage, I guess. I'd rather have a winning google than openai or meta anyway.
I agree, it would be the best of bad cases, in a sense. I have low trust in OpenAI due to its leadership, and in Meta, because, well, Meta has history, let's say.
Sparse models have same quality of results but have less coefficients to process, in case described in the link above sixteen (16) times as less.
This means that these models need 8 times less data to store, can be 16 and more times faster and use 16+ times less energy.
TPUs are not all that good in the case of sparse matrices. They can be used to train dense versions, but inference efficiency with sparse matrices may be not all that great.
It's all about the right mind set at the very top level. At the beginning of the PC era, nobody would bet IBM to lose. Same in the dawn of internet, all money was on MS. so it happened to Nokia and Ericsson.
Google is a giant without a direction. The ads money is so good that it just doesn't have the gut to leave it on the table.
With its AI offerings, can Google suck the oxygen out of AWS? AWS grew big because of compute. The AI spend will be far larger than compute. Can Google launch AI/Cloud offerings with free compute bundled? Use our AI, and we'll throw in compute for free.
It's a cool subject and article and things I only have a general understanding of (considering the place of posting).
What I'm sure about is having a programming unit more purposed to a task is more optimal than a general programming unit designed to accommodate all programming tasks.
More and more of the economics of programming boils down to energy usage and invariably towards physical rules, the efficiency of the process has the benefit of less energy consumed.
As a Layman is makes general sense. Maybe a future where productivity is based closer on energy efficiency rather than monetary gain pushes the economy in better directions.
Cryptocurrency and LLMs seem like they'll play out that story over the next 10 years.
How much of current GPU and TPU design is based around attn's bandwith hungry design? The article makes it seem like TPUs aren't very flexible so big model architecture changes, like new architectures that don't use attn, may lead to useless chips. That being said, I think it is great that we have some major competing architectures out there. GPUs, TPUs and UMA CPUs are all attacking the ecosystem in different ways which is what we need right now. Diversity in all things is always the right answer.
This is a bizarre argument to make for AI, since Google started working on TPUs in 2013 (12 years ago) and Sundar started publicly banging on about being an AI-first company in 2016. They missed the first boat on LLMs, but Google has been invested in AI for way longer than any of the competition.
Their incentive structure doesn't lead to longevity. Nobody gets promoted for keeping a product alive, they get promoted for shipping something new. That's why we're on version 37 of whatever their chat client is called now.
I think we can be reasonably sure that search, Gmail, and some flavor of AI will live on, but other than that, Google apps are basically end-of-life at launch.
It's also paradoxically the talent in tech that isolates them. The internal tech stack is so incredibly specialized, most Google products have to either be built for internal users or external users.
Agree there are lots of other contributing causes like culture, incentives, security, etc.
It's telling that basically all of Google's successful projects were either acquisitions or were sponsored directly by the founders (or sometimes, were acquisitions that were directly sponsored by the founders). Those are the only situations where you are immune from the performance review & promotion process.
deepseek kind of innovated on this using off-the-shelf components right ?
to quote from their paper
"In order to ensure sufficient computational performance for DualPipe, we customize efficient
cross-node all-to-all communication kernels (including dispatching and combining) to conserve
the number of SMs dedicated to communication. The implementation of the kernels is codesigned with the MoE gating algorithm and the network topology of our cluster."
> The GPUs were designed for graphics [...] However, because they are designed to handle everything from video game textures to scientific simulations, they carry “architectural baggage.” [...] A TPU, on the other hand, strips away all that baggage. It has no hardware for rasterization or texture mapping.
With simulations becoming key to training models doesn't this seem like a huge problem for Google?
All this assumes that LLMs are the sole mechanism for AI and will remain so forever: no novel architectures (neither hardware nor software), no progress in AI theory, nothing better than LLMs, simply brute force LLM computation ad infinitum.
Perhaps the assumptions are true. The mere presence of LLMs seems to have lowered the IQ of the Internet drastically, sopping up financial investors and resources that might otherwise be put to better use.
The most fun fact about all the developments post-ChatGPT is that people apparently forgot that Google was doing actual AI before AI meant (only) ML and GenAI/LLMs, and they were top players at it.
Arguably main OpenAI raison d'être was to be a counterweight to that pre-2023 Google AI dominance. But I'd also argue that OpenAI lost its way.
That and they were harvesting data way before it was cool, and now that it is cool, they're in a privileged position since almost no-one can afford to block GoogleBot.
They do voluntarily offer a way to signal that the data GoogleBot sees is not to be used for training, for now, and assuming you take them at their word, but AFAIK there is no way to stop them doing RAG on your content without destroying your SEO in the process.
I have never understood why, in these discussions, nobody brings up other specialized silicon providers like Groq, SambaNova, or my personal favorite, Cerebras.
Cerebras CS-3 specs:
• 4 trillion transistors
• 900,000 AI cores
• 125 petaflops of peak AI performance
• 44GB on-chip SRAM
• 5nm TSMC process
• External memory: 1.5TB, 12TB, or 1.2PB
• Trains AI models up to 24 trillion parameters
• Cluster size of up to 2048 CS-3 systems
• Memory B/W of 21 PB/s
• Fabric B/W of 214 Pb/s (~26.75 PB/s)
Comparing GPU to TPU is helpful for showcasing the advantages of the TPU in the same way that comparing CPU to Radeon GPU is helpful for showcasing the advantages of GPU, but everyone knows Radeon GPU's competition isn't CPU, it's Nvidia GPU!
TPU vs GPU is new paradigm vs old paradigm. GPUs aren't going away even after they "lose" the AI inference wars, but the winner isn't necessarily guaranteed to be the new paradigm chip from the most famous company.
Cerebras inference remains the fastest on the market to this day to my knowledge due to the use of massive on-chip SRAM rather than DRAM, and to my knowledge, they remain the only company focused on specialized inference hardware that has enough positive operating revenue to justify the costs from a financial perspective.
I get how valuable and important Google's OCS interconnects are, not just for TPUs or inference, but really as a demonstrated PoC for computing in general. Skipping the E-O-E translation in general is huge and the entire computing hardware industry would stand to benefit from taking notes here, but that alone doesn't automatically crown Google the victor here, does it?
Will Google sell TPUs that can be plugged into stock hardware, or custom hardware with lots of TPUs? Our customers want all their video processing to happen on site, and don't want their video or other data to touch the cloud, so they're not happy about renting cloud TPUs or GPUs. Also it would be nice to have smart cameras with built-in TPUs.
It's not Google Cloud per se, it's any cloud. There are a million reasons not to trust (or spend money on) any cloud. They want all their video and data on premises and completely under their control.
You can't really buy a TPU, you have to buy the entire data center that includes the TPU plus the services and support. In Google Colab, I often don't prefer the TPU either because the documentation for the AI isn't made for it. While this could all change in the long term, I also don't see these changes in Google's long term strategy. There's also the problem with Google's graveyard which isn't mentioned in the long term of the original article. Combined with these factors, I'm still skeptical about Google's lead on AI.
That's mentioned in the article, but is the lock-in really that big? In some cases, it's as easy as changing the backend of your high-level ML library.
That's what it is on paper. But in practice you trade one set of hardware idiosyncrasies for another and unless you have the right people to deal with that, it's a hassle.
That's actually one of the reasons why Google might win.
Nvidia is tied down to support previous and existing customers while Google can still easily shift things around without needing to worry too much about external dependencies.
Wait until Apple's ChromeBook competitor shows up to eat their lunch just like switching to another proprietary stack with no dev ecosystem will die out. Sure they'll go after big ticket accounts, also take a guess at what else gets sanctioned next.
This is the “Microsoft will dominate the Internet” stage.
The truth is the LLM boom has opened the first major crack in Google as the front page of the web (the biggest since Facebook), in the same way the web in the long run made Windows so irrelevant Microsoft seemingly don’t care about it at all.
Exactly, ChatGPT pretty much ate away ad volume & retention if th already garbage search results weren't enough. Don't even get me started on Android & Android TV as an ecosystem.
How high are the chances that as soon as China produces their own competitive TPU/GPU, they'll invade Taiwan in order to starve the West in regards to processing power, while at the same time getting an exclusive grip on the Taiwanese Fabs?
The US would destroy TSMC before letting China have it. China also views military conquest of Taiwan as less than ideal for a number of reasons, so I think right now it's seen as a potential defensive move in the face of American aggression.
Seems low at the moment with the concept of G2 being floated as generic understanding of China's ascension to where Russia used to be effectively recreating bipolar semi cold war world order. Mind, I am not saying impossible, but there are reasons China would want to avoid this scenario ( probably one of the few things US would not tolerate and would likely retaliate ).
If they have the fabs but ASML doesn't send them their new machines, they will just end up in the same situation as now, just one generation later. If China wants to compete, they need to learn how to make the EUV light and mirrors.
Not very. Those fabs are vulnerable things, shame if something happens to them. If China attacks, it would be for various other reasons and processors are only one of many considerations, no matter how improbable it might sound to an HN-er.
What if China becomes self-sufficient enough to no longer rely on Taiwanese Fabs, and hence having no issues with those Fabs getting destroyed. That would put China as the leader once and for all.
Highly unlikely. Despite the rampant anti-Chinese FUD that's so prevalent in the media (and, sadly, here on HN), China isn't really in the habit of invading other lands.
In my 20+ years of following NVIDIA, I have learned to never bet against them long-term. I actually do not know exactly why they continually win, but they do. The main issue they have a 3-4 year gap between wanting a new design pivot and realizing it (silicon has a long "pipeline"), it can seem that they may be missing a new trend or swerve in the demands of the market, it is often simply because there is this delay.
Fair, but the 75% margins can be reduced to 25% with healthy competition. The lack of competition in the frontier chips space was always the bottleneck to commoditization of computation, if such a thing is even possible
Google's real moat isn't the TPU silicon itself—it's not about cooling, individual performance, or hyper-specialization—but rather the massive parallel scale enabled by their OCS interconnects.
To quote The Next Platform: "An Ironwood cluster linked with Google’s absolutely unique optical circuit switch interconnect can bring to bear 9,216 Ironwood TPUs with a combined 1.77 PB of HBM memory... This makes a rackscale Nvidia system based on 144 “Blackwell” GPU chiplets with an aggregate of 20.7 TB of HBM memory look like a joke."
Nvidia may have the superior architecture at the single-chip level, but for large-scale distributed training (and inference) they currently have nothing that rivals Google's optical switching scalability.
Also, Google owns the entire vertical stack, which is what most people need. It can provide an entire spectrum of AI services far cheaper, at scale (and still profitable) via its cloud. Not every company needs to buy the hardware and build models, etc., etc.; what most companies need is an app store of AI offerings they can leverage. Google can offer this with a healthy profit margin, while others will eventually run out of money.
Google's work on Jax, pytorch, tensorflow, and the more general XLA underneath are exactly the kind of anti-moat everyone has been clamoring for.
9 replies →
They just need to actually make and market a good product though, and they seem to really struggle with this. Maybe on a long enough timeline their advantages will make this one inevitable.
all this vertical integration no wonder Apple and Google have such a tight relationship.
That is comparing an all to all switched Nvlink fabric to a 3D torus for TPUs. Those are completely different network topologies with different tradeoffs.
For example the currently very popular Mixture of Experts architectures require a lot of all to all traffic (for expert parallelism) which works a lot better on the switched NVlink fabric as opposed where it doesn't need to traverse multiple links in the torus.
This is an underrated point. Comparing just the peak bandwidth is like saying Bulldozer was the far superior CPU of the era because it had a really high frequency ceiling.
Really? Fully-connected hardware is in buildable (at scale) which we already know from the HPC world. Fat trees and dragonfly networks are pretty scalable, but a 3d torus is a very good tradeofff, and respects the dimensionality of reality.
Bisection bandwidth is a useful metric, but is hop count? Per-hop cost tends to be pretty small.
1 reply →
NVFP4 is the thing no one saw coming. I wasn't watching the MX process really, so I cast no judgements, but it's exactly what it sounds like, a serious compromise in resource constrained settings. And it's in the silicon pipeline.
NVFP4 is to put it mildly a masterpiece, the UTF-8 of its domain and in strikingly similar ways it is 1. general 2. robust to gross misuse 3. not optional if success and cost both matter.
It's not a gap that can be closed by a process node or an architecture tweak: it's an order of magnitude where the polynomials that were killing you on the way up are now working for you.
sm_120 (what NVIDIA's quiet repos call CTA1) consumer gear does softmax attention and projection/MLP blockscaled GEMM at a bit over a petaflop at 300W and close to two (dense) at 600W.
This changes the whole game and it's not clear anyone outside the lab even knows the new equilibrium points, it's nothing like Flash3 on Hopper, lotta stuff looks FLOPs bound, GDDR7 looks like a better deal than HBMe3. The DGX Spark is in no way deficient, it has ample memory bandwidth.
This has been in the pipe for something like five years and even if everyone else started at the beginning of the year when this was knowable, it would still be 12-18 months until tape out. And they haven't started.
Years Until Anyone Can Compete With NVIDIA is back up to the 2-5 it was 2-5 years ago.
This was supposed to be the year ROCm and the new Intel stuff became viable.
They had a plan.
This reads like a badly done, sponsored hype video on YouTube.
So if we look at what NVIDIA has to say about NVFP4 it sure sounds impressive [1]. But look closely that initial graph never compares fp8 and fp4 on the same hardware. They jump from H100 to B200 while implying a 5x jump of going with fp4 which it isn't. Accompanied with scary words like if you use MXFP4 "Risk of noticeable accuracy drop compared to FP8" .
Contrast that with what AMD has to say on the open MXFP4 approach which is quite similar to NVFP4 [2]. Ohh the horrors of getting 79.6 instead of 79.9 on GPQA Diamond when using MXFP4 instead of FP8.
[1] https://developer.nvidia.com/blog/introducing-nvfp4-for-effi...
[2] https://rocm.blogs.amd.com/software-tools-optimization/mxfp4...
This comment reads as if it were LLM-generated.
1 reply →
For all the excitement surrounding this, I fail to comprehend how Google can't even meet the current demand for Gemini 3^. Moreover, they are unwilling to invest in expansion directly (apparently have a mandate to double their compute every 6 months without spending more than their current budget). So, pardon me if I can't see how they will scale operations as demand grows while simultaneously selling their chips to competitors?! This situation doesn't make any sense.
^Even now I get capacity related error messages, so many days after the Gemini 3 launch. Also, Jules is basically unusable. Maybe Gemini 3 is a bigger resource hog than anyone outside of Google realizes.
I also suspect Google is launching models it can’t really sustain in volume or that are operating at a loss. Nothing preventing them from like doubling model size compared to the rest or allocating an insane amount of compute just to make the headlines on model performance (clearly it’s good for the stock). These things are opaque anyway, buried deep into the P&L.
It's fun when then you read last Nvidia tweet [1] suggesting that still their tech is better, based on pure vibes as anything in the (Gen)AI-era.
[1] https://x.com/nvidianewsroom/status/1993364210948936055
Not vibes. TPUs have fallen behind or had to be redesigned from scratch many times as neural architectures and workloads evolved, whereas the more general purpose GPUs kept on trucking and building on their prior investments. There's a good reason so much research is done on Nvidia clusters and not TPU clusters. TPU has often turned out to be over-specialized and Nvidia are pointing that out.
8 replies →
> based on pure vibes
The tweet gives their justification; CUDA isn't ASIC. Nvidia GPUs were popular for crypto mining, protein folding, and now AI inference too. TPUs are tensor ASICs.
FWIW I'm inclined to agree with Nvidia here. Scaling up a systolic array is impressive but nothing new.
> NVIDIA is a generation ahead of the industry
a generation is 6 months
3 replies →
OCS is indeed an engineering marvel, but look at NVIDIA's NVL72. They took a different path: instead of flexible optics, they used the brute force of copper, turning an entire rack into one giant GPU with unified memory. Google is solving the scale-out problem, while NVIDIA is solving the scale-up problem. For LLM training tasks, where communication is the bottleneck, NVIDIA's approach with NVLink might actually prove even more efficient than Google's optical routing.
No, not at all. If this were true Google would be killing it in MLPerf benchmarks, but they are not.
It’s better to have a faster, smaller network for model parallelism and a larger, slower one for data parallelism than a very large, but slower, network for everything. This is why NVIDIA wins.
I mean, Google just isn't participating it seems?
100 times more chips for equivalent memory, sure.
Check the specs again. Per chip, TPU 7x has 192GB of HBM3e, whereas the NVIDIA B200 has 186GB.
While the B200 wins on raw FP8 throughput (~9000 vs 4614 TFLOPs), that makes sense given NVIDIA has optimized for the single-chip game for over 20 years. But the bottleneck here isn't the chip—it's the domain size.
NVIDIA's top-tier NVL72 tops out at an NVLink domain of 72 Blackwell GPUs. Meanwhile, Google is connecting 9216 chips at 9.6Tbps to deliver nearly 43 ExaFlops. NVIDIA has the ecosystem (CUDA, community, etc.), but until they can match that interconnect scale, they simply don't compete in this weight class.
14 replies →
Ironwood is 192GB, Blackwell is 96GB, right? Or am i missing something?
1 reply →
I think it's not about the cost but the limits of quickly accessible RAM
I always enjoy being wrong and I was very wrong in my predictions about Google : I thought they should theoretically win, but I was also very confident they couldn't possibly turn their execution ship around to actually pull together a coherent competitor to OpenAI. But they do seem to have done that and it's very impressive. If they do continue to execute, I can't see anybody stopping them dominating and I would be bearish on nearly every other player catching them.
The biggest problem though is trust, and I'm still holding back from letting anyone under my authority in my org use Gemini because of the lack of any clear or reasonable statement or guidelines on how they use your data. I think it won't matter in the end if they execute their way to domination - but it's going to give everyone else a chance at least for a while.
> because of the lack of any clear or reasonable statement or guidelines on how they use your data.
They’ve been very clear, in my opinion: https://cloud.google.com/gemini/docs/discover/data-governanc...
I suppose there will always be the people who refuse to trust them or choose to believe they’re secretly doing something different.
However I’m not sure what you’re referring to by saying they haven’t said anything about how data is used.
The LLM provider I trust the most right now is AWS. Anybody else seems to have very conflicted purposes when it comes to sending them my data and interactions.
You're not wrong... but any space where Amazon, of all companies, has a shot at being the "most trustworthy player" is one I'm going to avoid where I can.
Amazon makes an LLM?
It doesn’t help when their thankgiving doodle that sends me to Gemini on how to plan making thanksgiving dinner on time completely fails in ridiculous ways
Unless they nerf Gemini 3.0 after a few weeks like they did with 2.5. Remember?
> If they do continue to execute
Yes, but Google will never be able to compete with their greatest challenge... Google's attention span.
This feels a lot like the RISC/CISC debate. More academic than it seems. Nvidia is designing their GPUs primarily to do exactly the same tasks TPUs are doing right now. Even within Google it's probably hard to tell whether or not it matters on a 5-year timeframe. It certainly gives Google an edge on some things, but in the fullness of time "GPUs" like the H100 are primarily used for running tensor models and they're going to have hardware that is ruthlessly optimized for that purpose.
And outside of Google this is a very academic debate. Any efficiency gains over GPUs will primarily turn into profit for Google rather than benefit for me as a developer or user of AI systems. Since Google doesn't sell TPUs, they are extremely well-positioned to ensure no one else can profit from any advantages created by TPUs.
Google does not sell them, but you can rent them:
https://cloud.google.com/tpu
As you note, they'll set the margins to benefit themselves, but you can still eke out some benefit.
Also, you can buy Edge TPUs, but as the name says these are for edge AI inference and useless for any heavy lifting workloads like training or LLMs.
https://www.amazon.com/Google-Coral-Accelerator-coprocessor-...
> Since Google doesn't sell TPUs, they are extremely well-positioned to ensure no one else can profit from any advantages created by TPUs.
First part is true at the moment, not sure the second follows. Microsoft is developing their own “Maia” chips for running AI on Azure with custom hardware, and everyone else is also getting in the game of hardware accelerators. Google is certainly ahead of the curve in making full-stack hardware that’s very very specialized for machine learning. But everyone else is moving in the same direction: lots of action is in buying up other companies that make interconnects and fancy networking equipment, and AMD/NVIDIA continue to hyper specialize their data center chips for neural networks.
Google is in a great position, for sure. But I don’t see how they can stop other players from converging on similar solutions.
I feel like this is more like the console/PC debate in the 90s. Consoles like the SNES had dedicated fixed function graphics hardware with weaker general specs, but with the special HW they could perform as well as a much more expensive PC - but as devs made more and more varied and clever games, that fixed function hardware couldn't support it and the PC became the superior choice.
This is highly relevant:
"Meta in talks to spend billions on Google's chips, The Information reports"
https://www.reuters.com/business/meta-talks-spend-billions-g...
Weird they'd do this after developing several generations of their own inference chip. Google is basically a competitor. This may just be a ploy to get better pricing from Nvidia.
keyword: "...talks..."
> It is also important to note that, until recently, the GenAI industry’s focus has largely been on training workloads. In training workloads, CUDA is very important, but when it comes to inference, even reasoning inference, CUDA is not that important, so the chances of expanding the TPU footprint in inference are much higher than those in training (although TPUs do really well in training as well – Gemini 3 the prime example).
Does anyone have a sense of why CUDA is more important for training than inference?
NVIDIA chips are more versatile. During training, you might need to schedule things to the SFU(Special Function unit that does sin, cos, 1/sqrt(x), etc), you might need to run epilogues, save intermediary computations, save gradients, etc. When you train, you might need to collect data from various GPUs, so you need to support interconnects, remote SMEM writing, etc.
Once you have trained, you have frozen weights/feed-forward networks that consist out of frozen weights that you can just program in and run data over. These weights can be duplicated across any amount of devices and just sit there and run inference with new data.
If this turns out to be the future use-case for NNs(it is today), then Google are better set.
Won't the need to train increase as the need for specialized, smaller models increases and we need to train their many variations? Also what about models that continuously learn/(re)train? Seems to me the need for training will only go up in the future.
1 reply →
All of those are things you can do with TPUs
This is a very important point - the market for training chips might be a bubble, but the market for inference is much, much larger. At some point we might have good enough models and the need for new frontier models will cool down. The big power-hungry datacenters we are seeing are mostly geared towards training, while inference-only systems are much simpler and power efficient.
A real shame, BTW, all that silicon doesn't do FP32 (very well). After training ceases to be that needed, we could use all that number crunching for climate models and weather prediction.
it's already the case that people are eeking out most further gains through layering "reasoning" on top of what existing models can do - in other words, using massive amounts of inference to substitute for increases model performance. Whereever things plateau I expect this will still be the case - so inference ultimately will always be the end game market.
Some more traditional number crunching has long looked at lower- and mixed-precision hardware.
Training is taking an enormous problem and trying to break it into lots of pieces and managing the data dependency between those pieces. It's solving 1 really hard problem. Inference is the opposite, it's lots of small independent problems. All of this "we have X many widgets connected to Y many high bandwidth optical telescopes" is all a training problem that they need to solve. Inference is "I have 20 tokens and I want to throw them at these 5,000,000 matrix multiplies, oh and I don't care about latency".
I can't think of any case where inference doesn't care about latency.
1 reply →
CUDA is just a better dev experience. Lots of training is experiments where developer/researcher productivity matters. Googlers get to use what they're given, others get to choose.
Once you settle on a design then doing ASICs to accelerate it might make sense. But I'm not sure the gap is so big, the article says some things that aren't really true of datacenter GPUs (Nvidia dc gpus haven't wasted hardware on graphics related stuff for years).
I think it’s the same reason windows is inportant to desktop computers. Software was written to depend on it. Same with most of the software out there today to train being built around CUDA. Even a version difference of CUDA can break things.
It's just more common as a legacy artifact from when nvidia was basically the only option available. Many shops are designing models and functions, and then training and iterating on nvidia hardware, but once you have a trained model it's largely fungible. See how Anthropic moved their models from nvidia hardware to Inferentia to XLA on Google TPUs.
Further it's worth noting that the Ironwood, Google's v7 TPU, supports only up to BF16 (a 16-bit floating point that has the range of FP32 minus the precision. Many training processes rely upon larger types, quantizing later, so this breaks a lot of assumptions. Yet Google surprised and actually training Gemini 3 with just that type, so I think a lot of people are reconsidering assumptions.
This is not the case for LLMs. FP16/BF16 training precision is standard, with FP8 inference very common. But labs are moving to FP8 training and even FP4.
When training a neural network, you usually play around with the architecture and need as much flexibility as possible. You need to support a large set of operations.
Another factor is that training is always done with batches. Inference batching depends on the number of concurrent users. This means training tends to be compute bound where supporting the latest data types is critical, whereas inference speeds are often bottlenecked by memory which does not lend itself to product differentiation. If you put the same memory into your chip as your competitor, the difference is going to be way smaller.
That quote left me with the same question. Something about decent amount of ram on one board perhaps? That’s advantageous for training but less so for inference?
inference is often a static, bounded problem solvable by generic compilers. training requires the mature ecosystem and numerical stability of cuda to handle mixed-precision operations. unless you rewrite the software from the ground up like Google but for most companies it's cheaper and faster to buy NVIDIA hardware
> static, bounded problem
What does it even mean in neural net context?
> numerical stability
also nice to expand a bit.
I don't think what the article writes about matters all that much. Gemini 3 Pro is arguably not even the best model anymore, and it's _weeks_ old, and Google has far more resources than Anthropic does. If the hardware actually was the secret sauce, Google would be wiping the floor with little everyone else.
But they're not.
There's a few confounding problems:
1. Actually using that hardware effectively isn't easy. It's not as simple as jacking up some constant values and reaping the benefits. Actually using the hardware is hard, and by the time you've optimized for it, you're already working on the next model.
2. This is a problem that, if you're not Google, you can just spend your way out of. A model doesn't take a petabyte of memory to train or run. Regular old H100s still mostly work fine. Faster models are nice, but Gemini 3 Pro being 50% of the latency as Opus 4.5 or GPT 5.1 doesn't add enough value to matter to really anyone.
3. There's still a lot of clever tricks that work as low hanging fruit to improve almost everything about ML models. You can make stuff remarkably good with novel research without building your own chips.
4. A surprising amount of ML model development is boots on the ground work. Doing evals. Curating datasets. Tweaking system prompts. Having your own Dyson sphere doesn't obviate a lot of the typing and staring at a screen that necessarily has to be done to make a model half decent.
5. Fancy bespoke hardware means fancy bespoke failure modes. You can search stack overflow for CUDA problems, you can't just Bing your way to victory when your fancy TPU cluster isn't doing the thing you want it to do.
I think you are addressing the issue from a developer's perspective. I don't think TPUs are going to be sold to individual users anytime soon. What the article is pointing out is that Google is now able to squeeze significantly more performance per dollar than their peer competitors in the LLM space.
For example, OpenAI has announced trillion-dollar investments in data centers to continue scaling. They need to go through a middle-man (Nvidia), while Google does not, and will be able to use their investment much more efficiently to train and serve their own future models.
> Google is now able to squeeze significantly more performance per dollar than their peer competitors in the LLM space
Performance per dollar doesn't "win" anything though. Performance (as in speed) hardly cracks the top five concerns that most folks have when choosing a model provider, because fast, good models already exist at price points that are acceptable. That might mean slightly better margins for Google, but ultimately isn't going to make them "win"
Google owns 14% of Anthropic and Anthropic is using Google TPUs, as well as AWS Trainium and of course GPUs. It isn't necessary for one company to create both the winning hardware and the winning software to be part of the solution. In fact with the close race in software hardware seems like the better bet.
https://www.anthropic.com/news/expanding-our-use-of-google-c...
They are using that hardware to wipe the floor with everyone if you look at the price per million tokens.
But price per token isn't even a directly important concern anymore. Anyone with a brain would pay 5x more per token for a model that uses 10x fewer tokens with the same accuracy. I've gone all in on Opus 4.5 because even though it's more expensive, it solves the problems I care about with far fewer tokens.
Gemini3 is slightly more expensive than GPT5.1 for both input and output tokens though?
Which model is doing so?
_Weeks_ old! What a fossil!
Slightly more seriously: what you say makes sense if and only if you're projecting Sam Altman and assuming that a) real legit superhuman AGI is just around the corner, and b) all the spoils will accrue to the first company that finds it, which means you need to be 100% in on building the next model that will finally unlock AGI.
But if this is not the case -- and it's increasingly looking like it's not -- it's going to continue to be a race of competing AIs, and that race will be won by the company that can deliver AI at scale the most cheaply. And the article is arguing that company will be Google.
> _Weeks_ old! What a fossil!
I think you are missing the point. They are saying "weeks old" isn't very old.
> it's going to continue to be a race of competing AIs, and that race will be won by the company that can deliver AI at scale the most cheaply.
I don't see how that follows at all. Quality and distribution both matter a lot here.
Google has some advantages but some disadvantages here too.
If you are on AWS GovCloud, Anthropic is right there. Same on Azure, and on Oracle.
I believe Gemini will be available on the Oracle Cloud at some point (it has been announced) but they are still behind in the enterprise distribution race.
OpenAI is only available on Azure, although I believe their new contract lets them strike deals elsewhere.
On the consumer side, OpenAI and Google are well ahead of course.
> _Weeks_ old! What a fossil!
Last week it looked like Google had won (hence the blog post) but now almost nobody is talking about antigravity and Gemini 3 anymore so yeah what op says is relevant
"Gemini 3 Pro is arguably not even the best model anymore"
Arguably indeed, because I think it still is.
It definitely depends on how you're measuring. But the benchmarks don't put it at the top for many ways of measuring, and my own experience doesn't put it at the top. I'm glad if it works for you, but it's not even a month old and there are lots of folks like me who see it as definitely worse for classes of problems that 3 Pro could be the best at.
Which is to say, if Google was set up to win, it shouldn't even be a question that 3 Pro is the best. It should be obvious. But it's definitely not obvious that it's the best, and many benchmarks don't support it as being the best.
On point 5, I think this is the real moat for CUDA. Does Google have tools to optimize kernels on their TPUs? Do they have tools to optimize successive kernel launches on their TPUs? How easy is it to debug on a TPU(arguably CUDA could use work here but still...)? Does Google help me fully utilize their TPUs? Can I warm up a model on a TPU, checkpoint it, and launch the checkpoints to save time?
I am fairly pro-google(they invented the LLM, FFS...) and recognize the advantages(price/token, efficiency, vertical integration, established DCs w/ power allocations) but also know they have a habit of slightly sucking at everything but search.
Fairly certain google is aiming for "realtime" model training which would definitely require a new arcjitscture
I didn't doubt it, but I also don't think realtime model training makes them "win" anything.
A question I don't see addressed in all these articles: what prevents Nvidia from doing the same thing and iterating on their more general-purpose GPU towards a more focused TPU-like chip as well, if that turns out to be what the market really wants.
They will, I'm sure.
The big difference is that Google is both the chip designer *and* the AI company. So they get both sets of profits.
Both Google and Nvidia contract TSMC for chips. Then Nvidia sells them at a huge profit. Then OpenAI (for example) buys them at that inflated rate and them puts them into production.
So while Nvidia is "selling shovels", Google is making their own shovels and has their own mines.
on top of that Google is also cloud infrastructure provider - contrary to OpenAI that need to have someone like Azure plug those GPUs and host servers.
1 reply →
The own shovels for own mines strategy has a hidden downside: isolation. NVIDIA sells shovels to everyone - OpenAI, Meta, xAI, Microsoft - and gets feedback from the entire market. They see where the industry is heading faster than Google, which is stewing in its own juices. While Google optimizes TPUs for current Google tasks (Gemini, Search), NVIDIA optimizes GPUs for all possible future tasks. In an era of rapid change, the market's hive mind usually beats closed vertical integration.
Aka vertical integration.
> AI ... profits
Citation needed. But the vertical integration is likely valuable right now, especially with NVidia being supply constrained.
So when the bubble pops the companies making the shovels (TSMC, NVIDIA) might still have the money they got for their products and some of the ex-AI companies might least be able to sell standard compliant GPUs on the wider market.
And Google will end up with lots of useless super specialized custom hardware.
14 replies →
Selling shovels may still turn out to be the right move: Nvidia got rich off the cryptocurrency bubble, now they're getting even richer off the AI bubble.
Having your own mines only pays off if you actually do strike gold. So far AI undercuts Google's profitable search ads, and loses money for OpenAI.
Deepmind gets to work directly with the TPU team to make custom modifications and designs specifically for deepmind projects. They get to make pickaxes that are made exactly for the mine they are working.
Everyone using Nvidia hardware has a lot of overlap in requirements, but they also all have enough architectural differences that they won't be able to match Google.
OpenAI announced they will be designing their own chips, exactly for this reason, but that also becomes another extremely capital intensive investment for them.
This also doesn't get into that Google also already has S-tier dataceters and datacenter construction/management capabilities.
Isn’t there a suspicion that OpenAI buying custom chips from another Sam Altman venture is just graft? Wasn’t that one of the things that came up when the board tried to out him?
3 replies →
> Deepmind gets to work directly with the TPU team to make custom modifications
You don't think Nvidia has field-service engineers and applications engineers with their big customers? Come on man. There is quite a bit of dialogue between the big players and the chipmaker.
1 reply →
It's not that the TPU is better than an NVidia GPU, it's just that it's cheaper since it doesn't have a fat NVidia markup applied, and is also better vertically integrated since it was designed/specified by Google for Google.
TPUs are also cheaper because GPUs need to be more general purpose whereas TPUs are designed with a focus on LLM workloads meaning there's not wasted silicon. Nothing's there that doesn't need to be there. The potential downside would be if a significantly different architecture arises that would be difficult for TPUs to handle and easier for GPUs (given their more general purpose). But even then Google could probably pivot fairly quickly to a different TPU design.
1 reply →
That's exactly what Nvidia is doing with tensor cores.
Except the native width of Tensor Cores are about 8-32 (depending on scalar type), whereas the width of TPUs is up to 256. The difference in scale is massive.
2 replies →
That's pretty much what they've been doing incrementally with the data center line of GPUs versus GeForce since 2017. Currently, the data center GPUs now have up to 6 times the performance at matrix math of the GeForce chips and much more memory. Nvidia has managed to stay one tape out away from addressing any competitors so far.
The real challenge is getting the TPU to do more general purpose computation. But that doesn't make for as good a story. And the point about Google arbitrarily raising the prices as soon as they think they have the upper hand is good old fashioned capitalism in action.
Nvidia doesn't have the software stack to do a TPU.
They could make a systolic array TPU and software, perhaps. But it would mean abandoning 18 years of CUDA.
The top post right now is talking about TPU's colossal advantage in scaling & throughput. Ironwood is massively bigger & faster than what Nvidia is shooting for, already. And that's a huge advantage. But imo that is a replicateable win. Throw gobs more at networking and scaling and nvidia could do similar with their architecture.
The architectural win of what TPU is more interesting. Google sort of has a working super powerful Connection Machine CM-1. The systolic array is a lot of (semi-)independent machines that communicate with nearby chips. There's incredible work going on to figure out how to map problems onto these arrays.
Where-as on a GPU, main memory is used to transfer intermediary results. It doesn't really matter who picks up work, there's lots of worklets with equal access time to that bit of main memory. The actual situation is a little more nuanced (even in consumer gpu's there's really multiple different main memories, which creates some locality), but there's much less need for data locality in the GPU, and much much much much tighter needs, the whole premise of the TPU is to exploit data locality. Because sending data to a neighbor is cheap, sending storing and retrieving data from memory is slower and much more energy intense.
CUDA takes advantage of, relies strongly on the GPU's reliance in main memory being (somewhat) globally accessible. There's plenty of workloads folks do in CUDA that would never work on TPU, on these much more specialized data-passing systolic arrays. That's why TPUs are so amazing, because they are much more constrained devices, that require so much more careful workload planning, to get the work to flow across the 2D array of the chip.
Google's work on projects like XLA and IREE is a wonderful & glorious general pursuit of how to map these big crazy machine learning pipelines down onto specific hardware. Nvidia could make their own or join forces here. And perhaps they will. But the CUDA moat would have to be left behind.
> They could make a systolic array TPU and software, perhaps. But it would mean abandoning 18 years of CUDA.
Tensor cores are specialized and have CUDA support.
1 reply →
the entire organisation has been built over the last 25 years to produce GPUs
turning a giant lumbering ship around is not easy
For sure, I did not mean to imply they could do it quickly or easily, but I have to assume that internally at Nvidia there's already work happening to figure out "can we make chips that are better for AI and cheaper/easier to make than GPUs?"
1 reply →
It’s not binary. It’s not existential. What’s at stake for Nvidia is its HUGE profit margins. 5 years from now, Nvidia could be selling 100x as many chips. But its market cap could be a fraction of what it is now if competition is so intense that its making 5% profit margin instead of 90%.
More like 900% right now.
My personal guess would be what drives the cost and size of these chips is the memory bandwidth and the transcievers required to support it. Since transcievers/memory controllers are on the edge of the chip, you get a certain minimum circumference for a given bandwidth, which determines your min surface area.
It might be even 'free' to fill it with more complicated logic (especially one that allows you write clever algorithms that let you save on bandwidth).
They lose the competitive advantage. They have nothing more to offer than what Google has in-house.
Nothing in principle. But Huang probably doesn't believe in hyper specializing their chips at this stage because it's unlikely that the compute demands of 2035 are something we can predict today. For a counterpoint, Jim Keller took Tenstorrent in the opposite direction. Their chips are also very efficient, but even more general purpose than NVIDIA chips.
How is Tenstorrent h/w more general purpose than NVIDIA chips? TT hardware is only good for matmuls and some elementwise operations, and plain sucks for anything else. Their software is abysmal.
1 reply →
For users buying H200s for AI workloads, the "ASIC" tensor cores deliver the overwhelming bulk of performance. So they already do this, and have been since Volta in 2017.
To put it into perspective, the tensor cores deliver about 2,000 TFLOPs of FP8, and half that for FP16, and this is all tensor FMA/MAC (comprising the bulk of compute for AI workloads). The CUDA cores -- the rest of the GPU -- deliver more in the 70 TFLOP range.
So if data centres are buying nvidia hardware for AI, they already are buying focused TPU chips that almost incidentally have some other hardware that can do some other stuff.
I mean, GPUs still have a lot of non-tensor general uses in the sciences, finance, etc, and TPUs don't touch that, but yes a lot of nvidia GPUs are being sold as a focused TPU-like chip.
Is it the Cuda cores that run the vertex/fragment/etc shaders in normal GPUs? Where does the ray tracing units fit in? How much of a modern Nvidia GPU is general purpose vs specialized to graphics pipelines?
1 reply →
> what prevents Nvidia from doing the same thing and iterating on their more general-purpose GPU towards a more focused TPU-like chip as well, if that turns out to be what the market really wants.
Nothing prevents them per se, but it would risk cannibalising their highly profitable (IIRC 50% margin) higher end cards.
- ASIC won the crypto mining battle in the past, it's orders of magnitude faster
- Google is not owning the technology but builds a cohesive cloud around it, Tesla, Meta work on their own asic ai chips and I guess others
- A signal is already given: Softbank sold it's entire Nvidia stock and berkshire added google on their portfolio.
Microsoft "has" a lot of companies data, and google is probably building the most advanced ai cloud.
However, I can't think they had a cloud which was light-years ahead of aws 15 years ago and now GCP is no 3, they also released opensource gpt models more than 5 years ago that constituted the foundation for openai closed sourced models.
If Google won, it would cannibalize its current ad-driven business and replace it with something that is extremely expensive to run and difficult to make profit from. A Pyrrhic win essentially.
But that would happen regardless of who won, better to at least dominate the new paradigm and figure out how to extract value from it. I also suspect that once the value generation is figured out they will cease offering these APIs to anyone, if you had a golden goose would you rent it?
Hardly a Pyrrhic win. When the rest of the market is burning money, whoever burns money the slowest while still remaining competitive will win.
They could go all dark mirror and inject ads directly into the responses.
I mean, focus is a thing that Google has always struggled with. But I kind of doubt that customers who need online marketing (ads) are going to convert en masse to users who rent cloud TPUs instead.
I have read in the past that ASICs for LLMs are not as simple a solution compared to cryptocurrency. In order to design and build the ASIC you need to commit to a specific architecture: a hashing algorithm for a cryptocurrency is fixed but the LLMs are always changing.
Am I misunderstanding "TPU" in the context of the article?
LLMs require memory and interconnect bandwidth so needs a whole package that is capable of feeding data to the compute. Crypto is 100% compute bound. Crypto is a trivially parallelized application that runs the same calculation over N inputs.
Regardless of architecture (which is anyways basically the same for all LLMs), the computational needs of modern neural networks are pretty generic, centered around things like matrix multiply, which is what the TPU provides. There is even TPU support for some operations built into PyTorch - it is not just a proprietary interface that Google use themselves.
"Application-specific" doesn't necessarily mean unprogrammable. Bitcoin miners aren't programmable because they don't need to be. TPUs are ASICs for ML and need to be programmable so they can run different models. In theory, you could make an ASIC hardcoded for a specific model, but given how fast models evolve, it probably wouldn't make much economic sense.
It’s true that architectures change, but they are built from common components. The most important of those is matrix multiplication, using a relatively small set of floating point data types. A device that accelerates those operations is, effectively, an ASIC for LLMs.
We used to call these things DSPs
7 replies →
Cryptocurrency architectures also change - Bitcoin is just about the lone holdout that never evolves. The hashing algorithm for Monero is designed so that a Monero hashing ASIC is literally just a CPU, and it doesn't even matter what the instruction set is.
The funniest thing about this story is that NVIDIA has essentially become a TPU company. Look at the Hopper and Blackwell architectures: Tensor Cores are taking up more space, the Transformer Engine has appeared, and NVLink has started to look like a supercomputer interconnect. Jensen Huang isn't stupid. He saw the threat of specialized ASICs and just built the ASIC inside the GPU. Now we have a GPU that is 80% matrix multiplier but still keeps CUDA compatibility. Google tried to kill the GPU, but instead forced the GPU to mutate into a TPU
There's an issue with building a swiss knife chip that supports everything back to the 80s, it works great until it doesn't (Intel)
The part that surprised me is how much TPUs gain from the systolic array design. It basically cuts down the constant memory shuffling that GPUs have to do, so more of the chip’s time is spent actually computing.
The downside is the same thing that makes them fast: they’re very specialized. If your code already fits the TPU stack (JAX/TensorFlow), you get great performance per dollar. If not, the ecosystem gap and fear of lock-in make GPUs the safer default.
Given the importance of scale for this particular product, any company placing itself on "just" one layer of the whole story is at a heavy disadvantage, I guess. I'd rather have a winning google than openai or meta anyway.
> I'd rather have a winning google than openai or meta anyway.
Why? To me, it seems better for the market, if the best models and the best hardware were not controlled by the same company.
I agree, it would be the best of bad cases, in a sense. I have low trust in OpenAI due to its leadership, and in Meta, because, well, Meta has history, let's say.
2 replies →
5 days ago: https://news.ycombinator.com/item?id=45926371
Sparse models have same quality of results but have less coefficients to process, in case described in the link above sixteen (16) times as less.
This means that these models need 8 times less data to store, can be 16 and more times faster and use 16+ times less energy.
TPUs are not all that good in the case of sparse matrices. They can be used to train dense versions, but inference efficiency with sparse matrices may be not all that great.
TPUs do include dedicated hardware, SparseCores, for sparse operations.
https://docs.cloud.google.com/tpu/docs/system-architecture-t...
https://openxla.org/xla/sparsecore
SparseCores appear to be block-sparse as opposed to element-sparse. They use 8- and 16-wide vectors to compute.
Here's another inference-efficient architecture where TPUs are useless: https://arxiv.org/pdf/2210.08277
There is no matrix-vector multiplication. Parameters are estimated using Gumbel-Softmax. TPUs are of no use here.
Inference is done bit-wise and most efficient inference is done after application of boolean logic simplification algorithms (ABC or mockturtle).
In my (not so) humble opinion, TPUs are example case of premature optimization.
1 reply →
It's all about the right mind set at the very top level. At the beginning of the PC era, nobody would bet IBM to lose. Same in the dawn of internet, all money was on MS. so it happened to Nokia and Ericsson.
Google is a giant without a direction. The ads money is so good that it just doesn't have the gut to leave it on the table.
With its AI offerings, can Google suck the oxygen out of AWS? AWS grew big because of compute. The AI spend will be far larger than compute. Can Google launch AI/Cloud offerings with free compute bundled? Use our AI, and we'll throw in compute for free.
It's a cool subject and article and things I only have a general understanding of (considering the place of posting).
What I'm sure about is having a programming unit more purposed to a task is more optimal than a general programming unit designed to accommodate all programming tasks.
More and more of the economics of programming boils down to energy usage and invariably towards physical rules, the efficiency of the process has the benefit of less energy consumed.
As a Layman is makes general sense. Maybe a future where productivity is based closer on energy efficiency rather than monetary gain pushes the economy in better directions.
Cryptocurrency and LLMs seem like they'll play out that story over the next 10 years.
How much of current GPU and TPU design is based around attn's bandwith hungry design? The article makes it seem like TPUs aren't very flexible so big model architecture changes, like new architectures that don't use attn, may lead to useless chips. That being said, I think it is great that we have some major competing architectures out there. GPUs, TPUs and UMA CPUs are all attacking the ecosystem in different ways which is what we need right now. Diversity in all things is always the right answer.
Google has always had great tech - their problem is the product or the perseverance, conviction, and taste needed to make things people want.
This is a bizarre argument to make for AI, since Google started working on TPUs in 2013 (12 years ago) and Sundar started publicly banging on about being an AI-first company in 2016. They missed the first boat on LLMs, but Google has been invested in AI for way longer than any of the competition.
https://aibusiness.com/companies/google-ceo-sundar-pichai-we...
Their incentive structure doesn't lead to longevity. Nobody gets promoted for keeping a product alive, they get promoted for shipping something new. That's why we're on version 37 of whatever their chat client is called now.
I think we can be reasonably sure that search, Gmail, and some flavor of AI will live on, but other than that, Google apps are basically end-of-life at launch.
It's also paradoxically the talent in tech that isolates them. The internal tech stack is so incredibly specialized, most Google products have to either be built for internal users or external users.
Agree there are lots of other contributing causes like culture, incentives, security, etc.
1 reply →
Google released their latest chat app 8 years ago.
It's telling that basically all of Google's successful projects were either acquisitions or were sponsored directly by the founders (or sometimes, were acquisitions that were directly sponsored by the founders). Those are the only situations where you are immune from the performance review & promotion process.
3 replies →
Odd way to describe the most used product in the history of the world.
Fuschia or me?
deepseek kind of innovated on this using off-the-shelf components right ?
to quote from their paper "In order to ensure sufficient computational performance for DualPipe, we customize efficient cross-node all-to-all communication kernels (including dispatching and combining) to conserve the number of SMs dedicated to communication. The implementation of the kernels is codesigned with the MoE gating algorithm and the network topology of our cluster."
> The GPUs were designed for graphics [...] However, because they are designed to handle everything from video game textures to scientific simulations, they carry “architectural baggage.” [...] A TPU, on the other hand, strips away all that baggage. It has no hardware for rasterization or texture mapping.
With simulations becoming key to training models doesn't this seem like a huge problem for Google?
Any chance of a bit of support for jax-metal, or incorporating apple silicon support into Jax?
I wish we had more options for a dedicated/stand-alone TPU for end users. I recently bought a 2019 Coral, which as far as I know is my only option.
Coral really has little to do with modern TPUs.
All this assumes that LLMs are the sole mechanism for AI and will remain so forever: no novel architectures (neither hardware nor software), no progress in AI theory, nothing better than LLMs, simply brute force LLM computation ad infinitum.
Perhaps the assumptions are true. The mere presence of LLMs seems to have lowered the IQ of the Internet drastically, sopping up financial investors and resources that might otherwise be put to better use.
TPUs predate LLMs by a long time. They were already being used for all the other internal ML work needed for search, youtube, etc.
That's incorrect. TPUs can support many ML workloads, they're not exclusive to LLMs.
That and the fact they can self-fund the whole AI venture and don't require outside investment.
The most fun fact about all the developments post-ChatGPT is that people apparently forgot that Google was doing actual AI before AI meant (only) ML and GenAI/LLMs, and they were top players at it.
Arguably main OpenAI raison d'être was to be a counterweight to that pre-2023 Google AI dominance. But I'd also argue that OpenAI lost its way.
And they forgot to pay those people so most of them left.
3 replies →
That and they were harvesting data way before it was cool, and now that it is cool, they're in a privileged position since almost no-one can afford to block GoogleBot.
They do voluntarily offer a way to signal that the data GoogleBot sees is not to be used for training, for now, and assuming you take them at their word, but AFAIK there is no way to stop them doing RAG on your content without destroying your SEO in the process.
But they also collect the data without causing denial of service, and respect robots.txt, which is more than you can say of most LLM scrapers...
Do people still get organic search traffic from google?
Wow, they really got folks by the short hairs if that is true...
At this stage, it is somewhat clear that it doesn't really matter who's ahead in the race, cause everyone else is super close behind...
I have never understood why, in these discussions, nobody brings up other specialized silicon providers like Groq, SambaNova, or my personal favorite, Cerebras.
Cerebras CS-3 specs:
• 4 trillion transistors
• 900,000 AI cores
• 125 petaflops of peak AI performance
• 44GB on-chip SRAM
• 5nm TSMC process
• External memory: 1.5TB, 12TB, or 1.2PB
• Trains AI models up to 24 trillion parameters
• Cluster size of up to 2048 CS-3 systems
• Memory B/W of 21 PB/s
• Fabric B/W of 214 Pb/s (~26.75 PB/s)
Comparing GPU to TPU is helpful for showcasing the advantages of the TPU in the same way that comparing CPU to Radeon GPU is helpful for showcasing the advantages of GPU, but everyone knows Radeon GPU's competition isn't CPU, it's Nvidia GPU!
TPU vs GPU is new paradigm vs old paradigm. GPUs aren't going away even after they "lose" the AI inference wars, but the winner isn't necessarily guaranteed to be the new paradigm chip from the most famous company.
Cerebras inference remains the fastest on the market to this day to my knowledge due to the use of massive on-chip SRAM rather than DRAM, and to my knowledge, they remain the only company focused on specialized inference hardware that has enough positive operating revenue to justify the costs from a financial perspective.
I get how valuable and important Google's OCS interconnects are, not just for TPUs or inference, but really as a demonstrated PoC for computing in general. Skipping the E-O-E translation in general is huge and the entire computing hardware industry would stand to benefit from taking notes here, but that alone doesn't automatically crown Google the victor here, does it?
Then Groq should reign emperor?
Will Google sell TPUs that can be plugged into stock hardware, or custom hardware with lots of TPUs? Our customers want all their video processing to happen on site, and don't want their video or other data to touch the cloud, so they're not happy about renting cloud TPUs or GPUs. Also it would be nice to have smart cameras with built-in TPUs.
Why don't your customers trust Google Cloud?
It's not Google Cloud per se, it's any cloud. There are a million reasons not to trust (or spend money on) any cloud. They want all their video and data on premises and completely under their control.
1 reply →
Yes, but what's at the finish line? The bottom?
You can't really buy a TPU, you have to buy the entire data center that includes the TPU plus the services and support. In Google Colab, I often don't prefer the TPU either because the documentation for the AI isn't made for it. While this could all change in the long term, I also don't see these changes in Google's long term strategy. There's also the problem with Google's graveyard which isn't mentioned in the long term of the original article. Combined with these factors, I'm still skeptical about Google's lead on AI.
[dead]
[dead]
Right because people would love to get locked into another even more expensive platform.
That's mentioned in the article, but is the lock-in really that big? In some cases, it's as easy as changing the backend of your high-level ML library.
That is like how every ORM promises you can just swap out the storage layer.
In practice it doesnt quite work out that way.
That's what it is on paper. But in practice you trade one set of hardware idiosyncrasies for another and unless you have the right people to deal with that, it's a hassle.
2 replies →
I thin k you can only run on google cloud not aws bare metal azure etc
https://killedbygoogle.com
That's actually one of the reasons why Google might win.
Nvidia is tied down to support previous and existing customers while Google can still easily shift things around without needing to worry too much about external dependencies.
It's all small products which didn't receive traction.
It's not though. Chromecast, g suite legacy, podcast, music, url shortener,... These weren't small products.
2 replies →
Google Hangouts wasn't small. Google+ was big and supposedly "the future" and is the canonical example of a huge misallocation of resources.
Google will have no problem discontinuing Google "AI" if they finally notice that people want a computer to shut up rather than talk at them.
7 replies →
Wait until Apple's ChromeBook competitor shows up to eat their lunch just like switching to another proprietary stack with no dev ecosystem will die out. Sure they'll go after big ticket accounts, also take a guess at what else gets sanctioned next.
1 reply →
This is the “Microsoft will dominate the Internet” stage.
The truth is the LLM boom has opened the first major crack in Google as the front page of the web (the biggest since Facebook), in the same way the web in the long run made Windows so irrelevant Microsoft seemingly don’t care about it at all.
Exactly, ChatGPT pretty much ate away ad volume & retention if th already garbage search results weren't enough. Don't even get me started on Android & Android TV as an ecosystem.
That's not the story that GOOGs quarterly earning reports tell(ad revenue up 12% YoY)
1 reply →
They can only privatize the AI race.
If Google wins, we all lose.
[dead]
How high are the chances that as soon as China produces their own competitive TPU/GPU, they'll invade Taiwan in order to starve the West in regards to processing power, while at the same time getting an exclusive grip on the Taiwanese Fabs?
China will invade Taiwan when they start losing, not when they're increasingly winning.
As long as "tomorrow" is a better day to invade Taiwan than today is, China will wait for tomorrow.
Their demographics beg to differ.
1 reply →
The US would destroy TSMC before letting China have it. China also views military conquest of Taiwan as less than ideal for a number of reasons, so I think right now it's seen as a potential defensive move in the face of American aggression.
Imo having the best logic process nodes is not necessary to win at AI - having the most memory bandwidth is - and China has SOTA HBMs.
I'd guess most of their handicap comes from their hardware and software not being as refined as the US's
Seems low at the moment with the concept of G2 being floated as generic understanding of China's ascension to where Russia used to be effectively recreating bipolar semi cold war world order. Mind, I am not saying impossible, but there are reasons China would want to avoid this scenario ( probably one of the few things US would not tolerate and would likely retaliate ).
If they have the fabs but ASML doesn't send them their new machines, they will just end up in the same situation as now, just one generation later. If China wants to compete, they need to learn how to make the EUV light and mirrors.
The fabs would be destroyed in such a situation. The wesr would absolutely play that card in negotiations.
Not very. Those fabs are vulnerable things, shame if something happens to them. If China attacks, it would be for various other reasons and processors are only one of many considerations, no matter how improbable it might sound to an HN-er.
What if China becomes self-sufficient enough to no longer rely on Taiwanese Fabs, and hence having no issues with those Fabs getting destroyed. That would put China as the leader once and for all.
4 replies →
Highly unlikely. Despite the rampant anti-Chinese FUD that's so prevalent in the media (and, sadly, here on HN), China isn't really in the habit of invading other lands.
The plot twist here is that China doesn't view Taiwan as foreign.
1 reply →
In my 20+ years of following NVIDIA, I have learned to never bet against them long-term. I actually do not know exactly why they continually win, but they do. The main issue they have a 3-4 year gap between wanting a new design pivot and realizing it (silicon has a long "pipeline"), it can seem that they may be missing a new trend or swerve in the demands of the market, it is often simply because there is this delay.
You could have said the same thing about Intel for ~50 years.
Depends on the top management though. I imagine Nvidia will keep doing well while Jensen Huang is running things.
Fair, but the 75% margins can be reduced to 25% with healthy competition. The lack of competition in the frontier chips space was always the bottleneck to commoditization of computation, if such a thing is even possible
Turkeys bet on tomorrow 364 days of the year.
I told you a thousand times, you have to sell your pumpkin stock before Halloween, before!