Comment by swiftcoder
5 months ago
> having 10,000 H100s just means that you can do 625 times more experiments than s1 did
I think the ball is very much in their court to demonstrate they actually are using their massive compute in such a productive fashion. My BigTech experience would tend to suggest that frugality went out the window the day the valuation took off, and they are in fact just burning compute for little gain, because why not...
This is pure speculation on my part but I think at some point a company's valuation became tied to how big their compute is so everybody jumped on the bandwagon.
I don't think you need to speculate too hard. On CNBC they are not tracking revenue, profits or technical breakthroughs, but how much the big companies are spending (on gpus). That's the metric!
I probably don't have to repeat it, but this is a perfect example of Goodhart's Law: when a metric is used as a target, it loses its effectiveness as a metric.
If you were a reporter who didn't necessarily understand how to value a particular algorithm or training operation, but you wanted a simple number to compare the amount of work OpenAI vs. Google vs Facebook are putting into their models, yeah, it makes sense. How many petaflops their datacenters are churning through in aggregate is probably correlated to the thing you're trying to understand. And it's probably easier to look at their financials and correlate how much they've spent on GPUs to how many petaflops of compute they need.
But when your investors are giving you more money based on how well they perceive you're doing, and their perception is not an oracle but is instead directly based on how much money you're spending... the GPUs don't actually need to do anything other than make number go up.
This feels like one of those stats they show from 1929 and everyone is like “and they didn’t know they were in a bubble?”
> but how much the big companies are spending (on gpus). That's the metric!
Burn rate based valuations!
The 2000's are back in full force!
"But tulip sales keep increasing!"
They absolutely are tracking revenues/profits on CNBC, what are you talking about?
Matt Levine tangentially talked about this during his podcast this past Friday (or was it the one before?). It was a good way to value these companies according to their compute size since those chips are very valuable. At a minimum, the chips are an asset that acts as a collateral.
I hear this a lot, but what the hell. It's still computer chips. They depreciate. Short supply won't last forever. Hell, GPUs burn out. It seems like using ice sculptures as collateral, and then spring comes.
17 replies →
> It was a good way to value these companies according to their compute size since those chips are very valuable.
Are they actually, though? Presently yes, but are they actually driving ROI? Or just an asset nobody really is meaningfully utilizing, but helps juice the stocks?
I asked this elsewhere, but, I don't fully understand the reason for the critical GPU shortage.
Isn't it because NVIDIA insists on only using the latest nodes from a single company (TSMC) for manufacture?
I don't understand why we can't use older process nodes to boost overall GPU making capacity.
Can't we have tiers of GPU availability some on cutting edge nodes, others built on older Intel and Samsung nodes?
Why is Nvidia not diversifying aggressively to Samsung and Intel no matter the process node.
Can someone explain?
I've heard packaging is also a concern, but can't you get Intel to figure that out with a large enough commitment?
(Also, I know NVIDIA has some capacity on Samsung. But why not go all out, even using Global Foundries?)
That's a great way to value a company that is going bankrupt.
But, I'm not going to value an operating construction company based on how many shovels or excavators they own. I'm going to want to see them putting those assets to productive use.
If you are a cloud provider renting them out
Otherwise you better keep them humming trying to find a business model because they certainly aren't getting any newer as chips
So, "No one was ever fired for ... buying more server infrastructure."
Walmart has massive, idle datacenters full of running machines doing nothing.
Mainly it points to a non-scientific "bigger is better" mentality, and the researchers probably didn't mind playing around with the power because "scale" is "cool".
Remember that the Lisp AI-labs people were working on non-solved problems on absolute potatoes of computers back in the day, we have a semblance of progress solution but so much of it has been brute-force (even if there has been improvements in the field).
The big question is if these insane spendings has pulled the rug on real progress if we head into another AI winter of disillusionment or if there is enough real progress just around the corner to show that there is hope for investors in a post-deepseek valuation hangover.
We are in a phase where costs are really coming down. We had this phase from GPT2 to about GPT4 where the key to building better models was just building bigger models and training them for longer. But since then a lot of work has gone into distillation and other techniques to make smaller models more capable.
If there is another AI winter, it will be more like the dotcom bubble: lots of important work got done in the dotcom bubble, but many of the big tech companies started from the fruits of that labor in the decade after the bubble burst
Besides that, AI training (aka gradient descent) is not really an "embarrassingly parallel" problem. At some point, there are diminishing returns on adding more GPUs, even though a lot of effort is going into making it as parallel as possible.
What? It definitely is.
Data parallelism, model parallelism, parameter server to workers, MoE itself can be split up, etc.
But even if it wasn’t, you can simply parallelize training runs with slight variations in hyperparameters. That is what the article is describing.
This claim is mathematically nonsensical. It implies a more-or-less linear relationship, that more is always better. But there's no reason to limit that to H100s. Conventional servers are, if anything, rather more established in their ability to generate value, by which I mean, however much potential AI servers may have to be more important than conventional servers that they may manifest in the future, we know how to use conventional servers to generate value now.
And thus, by this logic, every company in the world should just be buying as many servers as they can get their hands on, because More Servers = More Value.
Obviously, this is not happening. It doesn't take much analysis to start listing the many and manifold reasons why. Many of those reasons will apply to GPUs as well. Just as if everything in AWS got 10x faster, overnight, this would not create a situation where everyone suddenly starts grabbing more servers in AWS. Obviously everyone would start trimming down, even if perhaps in a few years time they'd find some way to use this burst of power such that they can use more later. This can't happen overnight, though. It would take time, and not "weeks" or "months" but "years" at scale.
Incorporating the important variable of time in the analysis, if AIs become literally hundreds of times cheaper to run, today, then it is perfectly logical that the near-term demand for the hardware to run them is also going to go way, way down. However much potential AI may have, it is fairly clear looking out at the AI landscape right now that there isn't really anyone out there unlocking vast amounts of value and sitting there wringing their hands because they just can't get more GPU compute. The GPU rush has been from fear that someone will figure out how to "really" unlock AI and then they'll be stuck without the hardware to compete.
It may be the case that vastly cheaper AI will in fact be part of unlocking that value, and that as the AI industry grows it will grow faster as a result... but that's still going to be on a multi-year time frame, not a tomorrow time frame. And all those GPUs and all those valuations are still broadly based on them being valuable real soon now, not in a few years, and all those GPU purchases are on the assumption they need them now, or on a timeframe where we can't be waiting around, rather than waiting for some rounds of exponential doublings to bring price down. The hardware curve in 5 years may be higher but the curve in the next year would be lower, and by a lot.
And, you know, who's to say we're done? I doubt there's another 100x in there, but is someone going to eke out another 2x improvement? Or a 10x improvement? Making it easier to run lots of experiments makes it much more likely for that to happen. I'm skeptical of another 10x general improvement but 10x improvements for specific, important use cases I can't rule out.
Edit: I should also point out this is an extremely common pattern in technology in general. Often the very hardest part is producing a thing that does a particular task at all. Once we have it in hand, once we can use it and learn how it operates and what its characteristic operating modes are, once we can try modifications to it in the real world and see what happens, optimizing it becomes much easier, sometimes explosively so by comparison. Taking any first iteration of a tech that is practical and then trying to straight-line demand based on it is silly, in all sorts of ways and all directions. The internal combustion engine, for example, has had a myriad of impacts on the world and certainly after various improvements many, many millions if not billions of them have been made... but any company that reacted to the first couple of cars and just went ballistic buying those first-generation internal combustion engines would have lost everything, and rather quickly.
[dead]
For starters every employee has an H100 under their desk.