Comment by airhangerf15

7 days ago

An H100 is a $20k USD card and has 80GB of vRAM. Imagine a 2U rack server with $100k of these cards in it. Now imagine an entire rack of these things, plus all the other components (CPUs, RAM, passive cooling or water cooling) and you're talking $1 million per rack, not including the costs to run them or the engineers needed to maintain them. Even the "cheaper"

I don't think people realize the size of these compute units.

When the AI bubble pops is when you're likely to be able to realistically run good local models. I imagine some of these $100k servers going for $3k on eBay in 10 years, and a lot of electricians being asked to install new 240v connectors in makeshift server rooms or garages.

111 comments

airhangerf15

semi-extrinsic 7 days ago

What do you mean 10 years?

You can pick up a DGX-1 on Ebay right now for less than $10k. 256 GB vRAM (HBM2 nonetheless), NVLink capability, 512 GB RAM, 40 CPU cores, 8 TB SSD, 100 Gbit HBAs. Equivalent non-Nvidia branded machines are around $6k.

They are heavy, noisy like you would not believe, and a single one just about maxes out a 16A 240V circuit. Which also means it produces 13 000 BTU/hr of waste heat.

kj4ips 7 days ago
Fair warning: the BMCs on those suck so bad, and the firmware bundles are painful, since you need a working nvidia-specific container runtime to apply them, which you might not be able to get up and running because of a firmware bug causing almost all the ram to be presented as nonvolatile.
- iJohnDoe 6 days ago
  
  Are there better paths you would suggest? Any hardware people have reported better luck with?
  
  1 reply →
ksherlock 7 days ago
It's not waste heat if you only run it in the winter.
- hdgvhicv 7 days ago
  
  Opt if you ignore that both gas furnaces and heat pumps are more efficient than resistive loads.
  
  15 replies →
- energy123 6 days ago
  
  Seasonality in git commit frequency
eulgro 7 days ago
> 13 000 BTU/hr
In sane units: 3.8 kW
- andy99 7 days ago
  
  You mean 1.083 tons of refrigeration
- Skunkleton 7 days ago
  
  > In sane units: 3.8 kW
  5.1 Horsepower
  
  5 replies →
- markdown 6 days ago
  
  How many football fields of power?
- semi-extrinsic 6 days ago
  
  The choice of BTU/hr was firmly tongue in cheek for our American friends.
quickthrowman 7 days ago
You’ll need (2) 240V 20A 2P breakers, one for the server and one for the 1-ton mini-split to remove the heat ;)
- Dylan16807 7 days ago
  
  Matching AC would only need 1/4 the power, right? If you don't already have a method to remove heat.
  
  8 replies →
- Scoundreller 7 days ago
  
  Just air freight them from 60 degrees North to 60 degrees South and vice verse every 6 months.
- kelnos 7 days ago
  
  Well, get a heat pump with a good COP of 3 or more, and you won't need quite as much power ;)
  
  1 reply →
xtiansimon 6 days ago
> “They are heavy, noisy like you would not believe, … produces … waste heat.”
Haha. I bought a 20 yro IBM server off eBay for a song. It was fun for a minute. Soon became a doorstop and I sold it as pickup-only on eBay for $20. Beast. Never again have one in my home.
- yencabulator 6 days ago
  
  That's about the era my company was an IBM reseller. Once I was kneeling behind 8x1U starting up and all the fans went to max speed for 3 seconds. Never put rackmount hardware in a room that is near anything living.
- guenthert 6 days ago
  
  Get an AS400. Those were actually expected to be installed in an office, rather than a server room. Might still be perceived as loud at home, but won't be deafening and probably not louder than some gaming rigs.
CamperBob2 7 days ago
Are you talking about the guy in Temecula running two different auctions with some of the same photos (356878140643 and 357146508609, both showing a missing heat sink?) Interesting, but seems sketchy.
How useful is this Tesla-era hardware on current workloads? If you tried to run the full DeepSeek R1 model on it at (say) 4-bit quantization, any idea what kind of TTFT and TPS figures might be expected?
- oceanplexian 6 days ago
  
  I can’t speak to the Tesla stuff but I run an Epyc 7713 with a single 3090 and creatively splitting the model between GPU/8 channels of DDR4 I can do about 9 tokens per second on a q4 quant.
  
  1 reply →
- justincormack 6 days ago
  
  Tesla doesnt support 4 bit float.
nulltype 2 days ago

> What do you mean 10 years?
Didn’t the DGX-1 come out 9 years ago?

invaliduser 7 days ago

Even is the AI bubble does not pops, your prediction about those servers being available on ebay in 10 years will likely be true, because some datacenters will simply upgrade their hardware and resell their old ones to third parties.

potatolicious 7 days ago
Would anybody buy the hardware though?
Sure, datacenters will get rid of the hardware - but only because it's no longer commercially profitable run them, presumably because compute demands have eclipsed their abilities.
It's kind of like buying a used GeForce 980Ti in 2025. Would anyone buy them and run them besides out of nostalgia or curiosity? Just the power draw makes them uneconomical to run.
Much more likely every single H100 that exists today becomes e-waste in a few years. If you have need for H100-level compute you'd be able to buy it in the form of new hardware for way less money and consuming way less power.
For example if you actually wanted 980Ti-level compute in a desktop today you can just buy a RTX5050, which is ~50% faster, consumes half the power, and can be had for $250 brand new. Oh, and is well-supported by modern software stacks.
- CBarkleyU 7 days ago
  
  Off topic, but I bought my (still in active use) 980ti literally 9 years ago for that price. I know, I know, inflation and stuff, but I really expected more than 50% bang for my buck after 9 whole years…
- nucleardog 6 days ago
  
  > Sure, datacenters will get rid of the hardware - but only because it's no longer commercially profitable run them, presumably because compute demands have eclipsed their abilities.
  I think the existence of a pretty large secondary market for enterprise servers and such kind of shows that this won't be the case.
  Sure, if you're AWS and what you're selling _is_ raw compute, then couple generation old hardware may not be sufficiently profitable for you anymore... but there are a lot of other places that hardware could be applied to with different requirements or higher margins where it may still be.
  Even if they're only running models a generation or two out of date, there are a lot of use cases today, with today's models, that will continue to work fine going forward.
  And that's assuming it doesn't get replaced for some other reason that only applies when you're trying to sell compute at scale. A small uptick in the failure rate may make a big dent at OpenAI but not for a company that's only running 8 cards in a rack somewhere and has a few spares on hand. A small increase in energy efficiency might offset the capital outlay to upgrade at OpenAI, but not for the company that's only running 8 cards.
  I think there's still plenty of room in the market in places where running inference "at cost" would be profitable that are largely untapped right now because we haven't had a bunch of this hardware hit the market at a lower cost yet.
  
  1 reply →
- nullc 6 days ago
  
  I have around a thousand broadwell cores in 4 socket systems that I got for ~nothing from these sorts of sources... pretty useful. (I mean, I guess literally nothing since I extracted the storage backplanes and sold them for more than the systems cost me). I try to run tasks in low power costs hours on zen3/4 unless it's gonna take weeks just running on those, and if it will I crank up the rest of the cores.
  And 40 P40 GPUs that cost very little, which are a bit slow but with 24gb per gpu they're pretty useful for memory bandwidth bound tasks (and not horribly noncompetitive in terms of watts per TB/s).
  Given highly variable time of day power it's also pretty useful to just get 2x the computing power (at low cost) and just run it during the low power cost periods.
  So I think datacenter scrap is pretty useful.
- mindslight 6 days ago
  
  It's interesting to think about scenarios where that hardware would get used only part of the time, like say when the sun is shining and/or when dwelling heat is needed. The biggest sticking point would seem to be all of the capex for connecting them to do something useful. It's a shame that PLX switch chips are so expensive.
- airhangerf15 6 days ago
  
  The 5050 doesn't support 32-bit PsyX. So a bunch of games would be missing a ton of stuff. You'd still need the 980 running with it for older PhyX games because nVidia.
belter 7 days ago
Except their insane electricity demands will still be the same, meaning nobody will buy them. You have plenty of SPARC servers on Ebay.
- cicloid 7 days ago
  
  There is also a community of users known for not making sane financial decisions and keeping older technologies working in their basements.
  
  1 reply →
DecentShoes 6 days ago

This seems likely. Blizzard even sold off old World of Warcraft servers. You can still get them on ebay
mattmanser 7 days ago
Someone's take on AI was that we're collectively investing billions in data centers that will be utterly worthless in 10 years.
Unlike the investments in railways or telephone cables or roads or any other sort of architecture, this investment has a very short lifespan.
Their point was that whatever your take on AI, the present investment in data centres is a ridiculous waste and will always end up as a huge net loss compared to most other investments our societies could spend it on.
Maybe we'll invent AGI and he'll be proven wrong as they'll pay back themselves many times over, but I suspect they'll ultimately be proved right and it'll all end up as land fill.
- toast0 7 days ago
  
  The servers may well be worthless (or at least worth a lot less), but that's pretty much true for a long time. Not many people want to run on 10 year old servers (although I pay $30/month for a dedicated server that's dual Xeon L5640 or something like that, which is about 15 years old).
  The servers will be replaced, the networking equipment will be replaced. The building will still be useful, the fiber that was pulled to internet exchanges/etc will still be useful, the wiring to the electric utility will still be useful (although I've certainly heard stories of datacenters where much of the floor space is unusable, because power density of racks has increased and the power distribution is maxed out)
  
  3 replies →
- bespokedevelopr 7 days ago
  
  If it is all a waste and a bubble, I wonder what the long term impact will be of the infrastructure upgrades around these dcs. A lot of new HV wires and substations are being built out. Cities are expanding around clusters of dcs. Are they setting themselves up for a new rust belt?
  
  3 replies →
- dortlick 7 days ago
  
  Sure, but what about the collective investment in smartphones, digital cameras, laptops, even cars. Not much modern technology is useful and practical after 10 years, let alone 20. AI is probably moving a little faster than normal, but technology depreciation is not limited to AI.
- gscott 6 days ago
  
  If a coal powered electric plant it next to the data-center you might be able to get electric cheap enough to keep it going.
  Datacenters could go into the business of making personal PC's or workstations using the older NVIDIA cards and sell them.
- jonplackett 7 days ago
  
  They probably are right, but a counter argument could be how people thought going to the moon was pointless and insanely expensive, but the technology to put stuff in space and have GPS and comms satellites probably paid that back 100x
  
  6 replies →
- pbh101 6 days ago
  
  This isn’t my original take but if it results in more power buildout, especially restarting nuclear in the US, that’s an investment that would have staying power.
- mensetmanusman 7 days ago
  
  Utterly? Moores law per power requirement is dead, lower power units can run electric heating for small towns!

torginus 7 days ago

My personal sneaking suspicion is that publicly offered models are using way less compute than thought. In modern mixture of experts models, you can do top-k sampling, where only some experts are evaluated, meaning even SOTA models aren't using much more compute than a 70-80b non-MoE model.

ActorNightly 7 days ago

To piggyback on this, at enterprise level in modern age, the question is really not about "how are we going to serve all these users", it comes down to the fact that investors believe that eventually they will see a return on investment, and then pay whatever is needed to get the infra.

Even if you didn't have optimizations involved in terms of job scheduling, they would just build as many warehouses as necessary filled with as many racks as necessary to serve the required user base.

brikym 6 days ago

As a non-American the 240V thing made me laugh.

thecommakozzi 6 days ago

[dead]

eitally 7 days ago

What I wonder is what this means for Coreweave, Lambda and the rest, who are essentially just renting out fleets of racks like this. Does it ultimately result in acquisition by a larger player? Severe loss of demand? Can they even sell enough to cover the capex costs?

cootsnuck 6 days ago

It means they're likely going to be left holding a very expensive bag.
adw 7 days ago

These are also depreciating assets.

torginus 7 days ago

I wonder if it's feasible to hook up NAND flash with a high bandwidth link necessary for inference.

Each of these NAND chips hundreds of dies of flash stacked inside, and they are hooked up to the same data line, so just 1 of them can talk at the same time, and they still achieve >1GB/s bandwidth. If you could hook them up in parallel, you could have 100s of GBs of bandwidth per chip.

potatolicious 7 days ago
NAND is very, very slow relative to RAM, so you'd pay a huge performance penalty there. But maybe more importantly my impression is that memory contents mutate pretty heavily during inference (you're not just storing the fixed weights), so I'd be pretty concerned about NAND wear. Mutating a single bit on a NAND chip a million times over just results in a large pile of dead NAND chips.
- torginus 7 days ago
  
  No it's not slow - a single NAND chip in SSDs offers >1GB of bandwidth - inside the chip there are 100+ wafers actually holding the data, but in SSDs only one of them is active when reading/writing.
  You could probably make special NAND chips where all of them can be active at the same time, which means you could get 100GB+ bandwidth out of a single chip.
  This would be useless for data storage scenarios, but very useful when you have huge amounts of static data you need to read quickly.
  
  3 replies →

RagnarD 6 days ago

An RTX 6000 Pro (NVIDIA Blackwell GPU) has 96GB of VRAM and can be had for around $7700 currently (at least, the lowest price I've found.) It plugs into standard PC motherboard PCIe slots. The Max Q edition has slightly less performance but a max TDP of only 300W.

dboreham 6 days ago

They'll be in landfill in 10 years.

neko_ranger 7 days ago

Four H100 in a 2U rack didn't sound impressive, but that is accurate:

>A typical 1U or 2U server can accommodate 2-4 H100 PCIe GPUs, depending on the chassis design.

>In a 42U rack with 20x 2U servers (allowing space for switches and PDU), you could fit approximately 40-80 H100 PCIe GPUs.

michaelt 7 days ago
Why stop at 80 H100s for a mere 6.4 terabytes of GPU memory?
Supermicro will sell you a full rack loaded with servers [1] providing 13.4 TB of GPU memory.
And with 132kW of power output, you can heat an olympic-sized swimming pool by 1°C every day with that rack alone. That's almost as much power consumption as 10 mid-sized cars cruising at 50 mph.
[1] https://www.supermicro.com/en/products/system/gpu/48u/srs-gb...
- procaryote 6 days ago
  
  > as much power consumption as 10 mid-sized cars cruising at 50 mph
  Imperial units are so weird
- handfuloflight 6 days ago
  
  What about https://www.cerebras.ai/system?
jzymbaluk 7 days ago

And the big hyperscaler cloud providers are building city-block sized data centers stuffed to the gills with these racks as far as the eye can see

tootie 6 days ago

Yeah I think the crux of the issue is that chatgpt is serving a huge number of users including paid users and is still operating at a massive operating loss. They are spending truckloads of money on GPUs and selling access at a loss.

scarface_74 7 days ago

This isn’t like how Google was able to buy up dark fiber cheaply and use it.

From what I understand, this hardware has a high failure rate over the long term especially because of the heat they generate.

shusaku 6 days ago

> When the AI bubble pops is when you're likely to be able to realistically run good local models.

After years of “AI is a bubble, and will pop when everyone realizes they’re useless plagiarism parrots” it’s nice to move to the “AI is a bubble, and will pop when it becomes completely open and democratized” phase

cootsnuck 6 days ago

It's not even been 3 years. Give it time. The entire boom and bust of the dot come bubble took 7 years.

wakamana 7 days ago

[dead]