Comment by bastawhiz
19 hours ago
I don't think what the article writes about matters all that much. Gemini 3 Pro is arguably not even the best model anymore, and it's _weeks_ old, and Google has far more resources than Anthropic does. If the hardware actually was the secret sauce, Google would be wiping the floor with little everyone else.
But they're not.
There's a few confounding problems:
1. Actually using that hardware effectively isn't easy. It's not as simple as jacking up some constant values and reaping the benefits. Actually using the hardware is hard, and by the time you've optimized for it, you're already working on the next model.
2. This is a problem that, if you're not Google, you can just spend your way out of. A model doesn't take a petabyte of memory to train or run. Regular old H100s still mostly work fine. Faster models are nice, but Gemini 3 Pro being 50% of the latency as Opus 4.5 or GPT 5.1 doesn't add enough value to matter to really anyone.
3. There's still a lot of clever tricks that work as low hanging fruit to improve almost everything about ML models. You can make stuff remarkably good with novel research without building your own chips.
4. A surprising amount of ML model development is boots on the ground work. Doing evals. Curating datasets. Tweaking system prompts. Having your own Dyson sphere doesn't obviate a lot of the typing and staring at a screen that necessarily has to be done to make a model half decent.
5. Fancy bespoke hardware means fancy bespoke failure modes. You can search stack overflow for CUDA problems, you can't just Bing your way to victory when your fancy TPU cluster isn't doing the thing you want it to do.
I think you are addressing the issue from a developer's perspective. I don't think TPUs are going to be sold to individual users anytime soon. What the article is pointing out is that Google is now able to squeeze significantly more performance per dollar than their peer competitors in the LLM space.
For example, OpenAI has announced trillion-dollar investments in data centers to continue scaling. They need to go through a middle-man (Nvidia), while Google does not, and will be able to use their investment much more efficiently to train and serve their own future models.
> Google is now able to squeeze significantly more performance per dollar than their peer competitors in the LLM space
Performance per dollar doesn't "win" anything though. Performance (as in speed) hardly cracks the top five concerns that most folks have when choosing a model provider, because fast, good models already exist at price points that are acceptable. That might mean slightly better margins for Google, but ultimately isn't going to make them "win"
Google owns 14% of Anthropic and Anthropic is using Google TPUs, as well as AWS Trainium and of course GPUs. It isn't necessary for one company to create both the winning hardware and the winning software to be part of the solution. In fact with the close race in software hardware seems like the better bet.
https://www.anthropic.com/news/expanding-our-use-of-google-c...
They are using that hardware to wipe the floor with everyone if you look at the price per million tokens.
But price per token isn't even a directly important concern anymore. Anyone with a brain would pay 5x more per token for a model that uses 10x fewer tokens with the same accuracy. I've gone all in on Opus 4.5 because even though it's more expensive, it solves the problems I care about with far fewer tokens.
Gemini3 is slightly more expensive than GPT5.1 for both input and output tokens though?
Which model is doing so?
_Weeks_ old! What a fossil!
Slightly more seriously: what you say makes sense if and only if you're projecting Sam Altman and assuming that a) real legit superhuman AGI is just around the corner, and b) all the spoils will accrue to the first company that finds it, which means you need to be 100% in on building the next model that will finally unlock AGI.
But if this is not the case -- and it's increasingly looking like it's not -- it's going to continue to be a race of competing AIs, and that race will be won by the company that can deliver AI at scale the most cheaply. And the article is arguing that company will be Google.
> _Weeks_ old! What a fossil!
I think you are missing the point. They are saying "weeks old" isn't very old.
> it's going to continue to be a race of competing AIs, and that race will be won by the company that can deliver AI at scale the most cheaply.
I don't see how that follows at all. Quality and distribution both matter a lot here.
Google has some advantages but some disadvantages here too.
If you are on AWS GovCloud, Anthropic is right there. Same on Azure, and on Oracle.
I believe Gemini will be available on the Oracle Cloud at some point (it has been announced) but they are still behind in the enterprise distribution race.
OpenAI is only available on Azure, although I believe their new contract lets them strike deals elsewhere.
On the consumer side, OpenAI and Google are well ahead of course.
> _Weeks_ old! What a fossil!
Last week it looked like Google had won (hence the blog post) but now almost nobody is talking about antigravity and Gemini 3 anymore so yeah what op says is relevant
"Gemini 3 Pro is arguably not even the best model anymore"
Arguably indeed, because I think it still is.
It definitely depends on how you're measuring. But the benchmarks don't put it at the top for many ways of measuring, and my own experience doesn't put it at the top. I'm glad if it works for you, but it's not even a month old and there are lots of folks like me who see it as definitely worse for classes of problems that 3 Pro could be the best at.
Which is to say, if Google was set up to win, it shouldn't even be a question that 3 Pro is the best. It should be obvious. But it's definitely not obvious that it's the best, and many benchmarks don't support it as being the best.
On point 5, I think this is the real moat for CUDA. Does Google have tools to optimize kernels on their TPUs? Do they have tools to optimize successive kernel launches on their TPUs? How easy is it to debug on a TPU(arguably CUDA could use work here but still...)? Does Google help me fully utilize their TPUs? Can I warm up a model on a TPU, checkpoint it, and launch the checkpoints to save time?
I am fairly pro-google(they invented the LLM, FFS...) and recognize the advantages(price/token, efficiency, vertical integration, established DCs w/ power allocations) but also know they have a habit of slightly sucking at everything but search.
Fairly certain google is aiming for "realtime" model training which would definitely require a new arcjitscture
I didn't doubt it, but I also don't think realtime model training makes them "win" anything.