← Back to context

Comment by Octoth0rpe

4 months ago

> Krishna also referenced the depreciation of the AI chips inside data centers as another factor: "You've got to use it all in five years because at that point, you've got to throw it away and refill it," he said

This doesn't seem correct to me, or at least is built on several shaky assumptions. One would have to 'refill' your hardware if:

- AI accelerator cards all start dying around the 5 year mark, which is possible given the heat density/cooling needs, but doesn't seem all that likely.

- Technology advances such that only the absolute newest cards can be used to run _any_ model profitably, which only seems likely if we see some pretty radical advances in efficiency. Otherwise, it seems like assuming your hardware is stable after 5 years of burn in, you could continue to run older models on that hardware at only the cost of the floorspace/power. Maybe you need new cards for new models for some reason (maybe a new fp format that only new cards support? some magic amount of ram? etc), but it seems like there may be room for revenue via older/less capable models at a discounted rate.

Isn’t that what Michael Burry is complaining about? That five years is actually too generous when it comes to depreciation of these assets and that companies are being too relaxed with that estimate. The real depreciation is more like 2-3 years for these GPUs that cost tens of thousands of dollars a piece.

https://x.com/michaeljburry/status/1987918650104283372

  • That's exactly the thing. It's only about bookkeeping.

    The big AI corps keep pushing depreciation for GPUs into the future, no matter how long the hardware is actually useful. Some of them are now at 6 years. But GPUs are advancing fast, and new hardware brings more flops per watt, so there's a strong incentive to switch to the latest chips. Also, they run 24/7 at 100% capacity, so after only 1.5 years, a fair share of the chips is already toast. How much hardware do they have in their books that's actually not useful anymore? Noone knows! Slower depreciation means more profit right now (for those companies that actually make profit, like MS or Meta), but it's just kicking the can down the road. Eventually, all these investments have to get out of the books, and that's where it will eat their profits. In 2024, the big AI corps invested about $1 trillion in AI hardware, next year is expected to be $2 trillion. Only the interest payments for that are crazy. And all of this comes on top of the fact that none of the these companies actually make any profit at all with AI. (Except Nvidia of course) There's just no way this will pan out.

    • > It's only about bookkeeping.

      > Some of them are now at 6 years.

      There are three distinct but related topics here, it's not "just about bookkeeping" (though Michael Burry may be specifically pointing to the bookkeeping being misquoted):

      1. Financial depreciation - accounting principals typically follow the useful life of the capital asset (simply put, if an airplane typically gets used for 30 years, they'll split the cost of purchasing an airplane across 30 years equally on their books). Getting this right has more to do with how future purchases get financed due to how the bookkeepers show profitability, balance sheets, etc.. Cashflow is ultimately what might create an insolvent company.

      2. Useful life - per number 1 above - this is the estimated and actual life of the asset. So if the airplane actually is used over 35 years, not 30, it's actual useful life is 35 years. This is to your point of "some of them are 6 years old". Here is where this is going to get super tricky with GPUs. We (a) don't actually know what the useful life is or is going to be (hence Michael Burry's question) for these GPUs (b) the cost of this is going to get complicated fast. Let's say (I'm making these up) GPU X2000 is 2x the performance of GPU X1000 and your whole data center is full of GPU X1000. Do you replace all of those GPUs to increase throughput?

      3. Support & maintenance - this is what actually gets supported by the vendor. There doesn't seem to be any public info about the Nvidia GPUs but typically these are 3-5 years (usually tied to the useful life) and often can be extended. Again, this is going to get super complicated to financially because we don't know what future advancements might happen to performance improvements to GPUs (and therefore would necessitate replacing old ones and therefore creating renewed maintenance contracts).

    • > Also, they run 24/7 at 100% capacity, so after only 1.5 years

      How does OpenAI keep this load? I would expect the load at 2pm Eastern to be WAY bigger than the load after California goes to bed.

      3 replies →

    • Flops per watt is relevant for a new data center build-out where you're bottlenecked on electricity, but I'm not sure it matters so much for existing data centers. Electricity is such as small percentage of total cost of ownership. The marginal cost of running a 5 year old GPU for 2 more years is small. The husk of a data center is cheap. It's the cooling, power delivery equipment, networking, GPUs etc that costs money, and when you retrofit data centers for the latest and greatest GPUs you have to throw away a lot of good equipment. Makes more sense to build new data centers as long as inference demand doesn't level off.

  • How different is this from rental car companies changing over their fleets? I don't know, this is a genuine question. The cars cost 3-4x as much and last about 2x as far as I know, and the secondary market is still alive.

    • > How different is this from rental car companies changing over their fleets?

      New generations of GPUs leapfrog in efficiency (performance per watt) and vehicles don't? Cars don't get exponentially better every 2–3 years, meaning the second-hand market is alive and well. Some of us are quite happy driving older cars (two parked outside our home right now, both well over 100,000km driven).

      If you have a datacentre with older hardware, and your competitor has the latest hardware, you face the same physical space constraints, same cooling and power bills as they do? Except they are "doing more" than you are...

      Would we could call it "revenue per watt"?

      6 replies →

    • Rental car companies aren’t offering rentals at deep discount to try to kickstart a market.

      It would be much less of a deal if these companies were profitable and could cover the costs of renewing hardware, like car rental companies can.

    • I think it's a bit different because a rental car generates direct revenue that covers its cost. These GPU data centers are being used to train models (which themselves quickly become obsolete) and provide inference at a loss. Nothing in the current chain is profitable except selling the GPUs.

      6 replies →

    • > the secondary market is still alive.

      this is the crux. Will these data center cards, if a newer model came out with better efficiency, have a secondary market to sell to?

      It could be that second hand ai hardware going into consumers' hands is how they offload it without huge losses.

      13 replies →

I think its illustrative to consider the previous computation cycle ala Cryptomining. Which passed through a similar lifecycle with energy and GPU accelerators.

The need for cheap wattage forced the operations to arbitrage the where location for the cheapest/reliable existing supply - there rarely was new buildout as the cost was to be reimbursed by the coins the miningpool recovered.

For the chip situation caused the same apprecaition in GPU cards with periodic offloading of cards to the secondary market (after wear and tear) as newer/faster/more efficient cards came out until custom ASICs took over the heavy lifting, causing the GPU card market to pivot.

Similarly in the short to moedium term the uptick of custo ASICs like with Google TPU will definately make a dent in bot cpex/opex and potentially also lead to a market with used GPUs as ASICs dominate.

So for GPUs i can certainly see the 5 year horizon making a impact in investment decisions as ASICs proliferate.

It's just the same dynamic as old servers. They still work fine but power costs make them uneconomical compared to latest tech.

  • It’s far more extreme: old servers are still okay on I/O, and memory latency, etc. won’t change that dramatically so you can still find productive uses for them. AI workloads are hyper-focused on a single type of work and, unlike most regular servers, a limiting factor in direct competition with other companies.

    • I mean you could use training GPUs for inference right? That would be use case number 1 for a 8 * a100 box in a couple of years. It can also be used for non IO limited things like folding proteins or other 'scientific' use cases. Push comes to shove im sure an old A100 will run crysis.

      4 replies →

  • LambdaLabs is still making money off their Tesla V100s, A100s, and A6000s. The older ones are cheap enough to run some models and very cheap, so if that's all you need, that's what you'll pick.

    The V100 was released in 2017, A6000 in 2020, A100 in 2021.

  • That could change with a power generation breakthrough. If power is very cheap then running ancient gear till it falls apart starts making more sense

    • Power consumption is only part of the equation. More efficient chips => less heat => lower cooling costs and/or higher compute density in the same space.

      1 reply →

    • Hugely unlikely.

      Even if the power is free you still need a grid connection to move it to where you need it, and, guess what, the US grid is bursting at the seams. This is not just due to data center demand; it was struggling to cope with the transition away from coal well before that point.

      You also can’t buy a gas turbine for love nor money at the moment, and they’re not ever going to be free.

      If you plonked massive amounts of solar panels and batteries in the Nevada desert, that’s becoming cheap but it ain’t free, particularly as you’ll still need gas backup for a string of cloudy days.

      If you think SMRs are going to be cheap I have a bridge to sell you, you’re also not going to build them right next to your data centre because the NRC won’t let you.

      So that leaves fusion or geothermal. Geothermal is not presently “very cheap” and fusion power has not been demonstrated to work at any price.

  • I'm a little bit curious about this. Where do all the hardware from the big tech giants usually go once they've upgraded?

    • In-house hyperscaler stuff gets shredded, after every single piece of flash storage gets first drilled through and every hard drive gets bent by a hydraulic press. Then it goes into the usual e-waste recycling stream (ie. gets sent to poor countries where precious metals get extracted by people with a halved life expectancy).

      Off-the-shelf enterprise gear has a chance to get a second life through remarketing channels, but much of it also gets shredded due to dumb corporate policies. There are stories of some companies refusing to offload a massive decom onto the second hand market as it would actually cause a crash. :)

      It's a very efficient system, you see.

      1 reply →

    • I used (relatively) ancient servers (5-10 years in age) because their performance is completely adequate; they just use slightly more power. As a plus it's easy to buy spare parts, and they run on DDR3, so I'm not paying the current "RAM tax". I generally get such a server, max out its RAM, max out its CPUs and put it to work.

      10 replies →

    • Some is sold on the used market; some is destroyed. There are plenty of used V100 and A100 available now for example.

  • Manipulating this for creative accounting seems to be the root of Michael Burry’s argument, although I’m not fluent enough in his figures to map here. But, commenting that it interesting to see IBM argue a similar case (somewhat), or comments ITT hitting the same known facts, in light of Nvidia’s counterpoints to him.

    • Burry just did his first interview for many years https://youtu.be/nsE13fvjz18?t=265

      with Michael Lewis, about 30 mins long. Highlights - he thinks we are near the top, his puts are for two years time. If you go long he suggests healthcare stocks. He's been long gold some years, thinks bitcoin is dumb. Thinks this is dotcom bubble #2 except instead of pro investors it's mostly index funds this time. Most recent headlines about him have been bad reporting.

  • > They still work fine but power costs make them uneconomical compared to latest tech.

    That's not necessarily the driving financial decision, in fact I'd argue company's with data center hardware purchases barely look at this number. It's more simple than that - their support runs out and its cheaper to buy a new piece of hardware (that IS more efficient) because the hardware vendors make extended support inordinately expensive.

    Put yourselves in the shoes of a sales person at Dell selling enterprise server hardware and you'll see why this model makes sense.

  • Eh, not exactly. If you don't run CPU at 70%+ the rest of the machine isn't that much more inefficient that model generation or two behind.

    It used to be that new server could use half power of the old one at idle but vendors figured out that servers also need proper power management a while ago and it is much better.

    Last few gens increase could be summed up to "low % increase in efficiency, with TDP, memory channels and core count increase".

    So for loads not CPU bound the savings on newer gen aren't nearly worth it to replace it, and for bulk storage the CPU power usage is even smaller part

    • Definitely single thread performance and storage are the main reasons not to use an old server. A 6 year old server didn't have nvme drives, so SATA SSD at best. That's a major slow down if disk is important.

      Aside from that there's no reason to not use a dual socket server from 5 years ago instead of a single socket server of today. Power and reliability maybe not as good.

      1 reply →

  • that was then. now, high-end chips are reaching 4,3,2 nm. power savings aren't that high anymore. what's the power saving going from 4 to 2nm?

5 years is long, actually. This is not a GPU thing. It's standard for server hardware.

  • Because usually it's more efficient for companies to retire the hardware and put in new stuff.

    Meanwhile, my 10-15 year old server hardware keeps chugging along just fine in the rack in my garage.

    • I thought the same until I calculated that newer hardware consumes a few times less energy and for something running 24x7 that adds up quite a bit (I live in Europe, energy is quite expensive).

      So my homelab equipment is just 5 years old and it will get replaced in 2-3 years with something even more power efficient.

      7 replies →

    • Sample size of 1 though. It's like how I've had hard disks last a decade, but a 100 node Hadoop cluster had 3 die per week after a few years.

      1 reply →

    • "Just fine". Presumably you're not super-concerned with the energy costs? People who run data centres pretty much _have_ to be.

  • 5 years is a long time for GPUs maybe but normal servers have 7 year lifespans in many cases fwiw.

    These GPUs I assume basically have potential longevity issues due to the density, if you could cool it really really well I imagine no problem.

    • Normal servers are rarely run flat-out. These GPUs are supposed to be run that way. So, yeah, age is going to be a problem, as will cooling.

But if your competitor is running newer chips that consume less power per operation, aren't you forced to upgrade as well and dispose of the old hardware?

  • Sure, assuming the power cost reduction or capability increase justifies the expenditure. It's not clear that that will be the case. That's one of the shaky assumptions I'm referring to. It may be that the 2030 nvidia accelerators will save you $2000 in electricity per month per rack, and you can upgrade the whole rack for the low, low price of $800,000! That may not be worth it at all. If it saves you $200k/per rack or unlocks some additional capability that a 2025 accelerator is incapable of and customers are willing to pay for, then that's a different story. There are a ton of assumptions in these scenarios, and his logic doesn't seem to justify the confidence level.

    • > Sure, assuming the power cost reduction or capability increase justifies the expenditure. It's not clear that that will be the case.

      Share price is a bigger consideration than any +/- differences[1] between expenditure vs productivity delta. GAAP allows some flexibility in how servers are depreciated, so depending on what the company wants to signal to shareholders (investing in infra for futur returns vs curtailing costs), it may make sense to shorten or lengthen depreciation time regardless of the actual TCOO keep/refresh cost comparisons.

      1. Hypothetical scenario: a hardware refresh costs $80B, actual performance increase is only worth $8B, but the share price increases the value of org's holding of its own shares by $150B. As a CEO/CFO, which action would you recommend- without even considering your own bonus that's implicitly or explicitly tied to share price performance.

    • Demand/suppy economics is not so hypothetical.

      Illustration numbers: AI demand premium = $150 hardware with $50 electricity. Normal demand = $50 hardware with $50 electricity. This is Nvidia margins @75% instead of 40%. CAPEX/OPEX is 70%/20% hardware/power instead of customary 50%/40%.

      If bubble crashes, i.e. AI demand premium evaporates, we're back at $50 hardware and $50 electricity. Likely $50 hardware and $25 electricity if hardware improves. Nvdia back to 30-40% margins, operators on old hardware stuck with stranded assets.

      The key thing to understand is current racks are sold at grossly inflated premiums right now, scarcity pricing/tax. If the current AI economic model doesn't work then fundmentally that premium goes away and subsequent build outs are going to be costplus/commodity pricing = capex discounted by non trivial amounts. Any breakthroughs in hardware, i.e. TPU compute efficiency would stack opex (power) savings. Maybe by year 8, first gen of data centers are still depreciated to $80 hardware + $50 power vs new center @ $50 hardware + $25 power. That old data center is a massive write-down because it will generate less revenue than it costs to amoritize.

    • A typical data centre is $2,500 per year per kW load (including overhead, hvac and so on).

      If it costs $800,000 to replace the whole rack, then that would pay off in a year if it reduces 320 kW of consumption. Back when we ran servers, we wouldn't assume 100% utilisation but AI workloads do do that; normal server loads would be 10kW per rack and AI is closer to 100. So yeah, it's not hard to imagine power savings of 3.2 racks being worth it.

      4 replies →

  • It depends on how much profit you are making. As long as you can still be profitable on the old hardware you don't have to upgrade.

It’s not about assumptions on the hardware. It’s about the current demands for computation and expected growth of business needs. Since we have a couple years to measure against it should be extremely straightforward to predict. As such I have no reason to doubt the stated projections.

  • Networking gear was famously overbought. Enterprise hardware is tricky as there isn’t much of a resale market for this gear once all is said and done.

    The only valid use case for all of this compute which could reasonably replace ai is btc mining. I’m uncertain if the increased mining capacity would harm the market or not.

    • BTC mining on GPUs haven't been profitable for a long time, it's mostly ASICs, GPUs can be used for some other altcoins which makes the potential market for used previous generation GPUs even smaller.

      1 reply →

  • Failure rates also go up. For AI inference it’s probably not too bad in most cases, just take the node offline and re-schedule the jobs to other nodes.

There is the opportunity cost of using a whole datacenter to house ancient chips, even if they're still running. You're thinking like a personal use chip which you can run as long as it is non-defective. But for datacenters it doesn't make sense to use the same chips for more than a few years and I think 5 years is already stretching their real shelf life.

Do not forget that we're talking about supercomputers. Their interconnect makes machines not easily fungible, so even a low reduction in availability can have dramatic effects.

Also, after the end of the product life, replacement parts may no longer be available.

You need to get pretty creative with repair & refurbishment processes to counter these risks.

Historically, GPUs have improved in efficiency fast enough that people retired their hardware in way less than 5 years.

Also, historically the top of the line fabs were focused on CPUs, not GPUs. That has not been true for a generation, so it's not really clear if the depreciation speed will be maintained.

  • > Historically, GPUs have improved in efficiency fast enough that people retired their hardware in way less than 5 years.

    This was a time when chip transistor cost was decreasing rapidly. A few years earlier even RAM cost was decreasing quickly. But these times are over now. For example, the PlayStation 5 (where the GPU is the main cost), which launched five years ago, even increased in price! This is historically unprecedented.

    Most price/performance progress is nowadays made via better GPU architecture instead, but these architectures are already pretty mature, so the room for improvement is limited.

    Given that the price per transistor (which TSMC & Co are charging) has decreased ever more slowly in recent years, I assume it will eventually come almost to a halt.

    By the way, this is strictly speaking compatible with Moore's law, as it is only about transistors per chip area, not price. Of course the price per chip area was historically approximately constant, which meant exponentially increasing transistor density implied exponentially decreasing transistor price.

    • > This was a time when chip transistor cost was decreasing rapidly.

      GPUs were actually mostly playing catch-up. They were progressively becoming more expensive parts that could afford being built on more advanced fabs.

      And I'll have to point, "advanced fabs" is a completely post-Moore's law concept. Moore's law is about literally the number of transistors on the most economic package. Not any bullshit about area density that marketing people invented on the last decade (you can go read the paper). With Moore's law, the cheapest fab improves quickly enough that it beats whatever more advanced fabs existed before you can even finish designing a product.

  • > that people retired their hardware in way less than 5 years.

    those people are end-consumers (like gamers), and only recently, bitcoin miners.

    Gamers don't care for "profit and loss" - they want performance. Bitcoin miners do need to switch if they want to keep up.

    But will an AI data center do the same?

5 years is maybe referring to the accounting schedule for depreciation on computer hardware, not the actual useful lifetime of the hardware.

It's a little weird to phrase it like that though because you're right it doesn't mean you have to throw it out. Idk if this is some reflection of how IBM handles finance stuff or what. Certainly not all companies throw out hardware the minute they can't claim depreciation on it. But I don't know the numbers.

Anyways, 5 years is an infection point on numbers. Before 5 years you get depreciation to offset some cost of running. After 5 years, you do not, so the math does change.

  • that is how the investments are costed though, so makes sense when we're talking return on investment, so you can compare with alternatives under the same evaluation criteria.

When you operate big data centers it makes sense to refresh your hardware every 5 years or so because that’s the point at which the refreshed hardware is enough better to be worth the effort and expense. You don’t HAVE to, but its more cost effective if you do. (Source, used to operate big data centers)

It's worse than that in reality, AI chips are on a two year cadence for backwards compatibility (NVIDIA can basically guarantee it, and you probably won't be able to pay real AI devs enough to stick around to make hardware work arounds). So their accounting is optimistic.

  • 5 years is normal-ish depreciation time frame. I know they are gaming GPUs, but the RTX 3090 came out ~ 4.5 years before the RTX 5090. The 5090 has double the performance and 1/3 more memory. The 3090 is still a useful card even after 5 years.

General question to people who might actually know.

Is there anywhere that does anything like Backblaze's Hard Drive Failure Rates [1] for GPU Failure Rates in environments like data centers, high-performance computing, super-computers, mainframes?

[1] https://www.backblaze.com/blog/backblaze-drive-stats-for-q3-...

The best that came back on a search was a semi-modern article from 2023 [2] that appears to be a one-off and mostly related to consumer facing GPU purchases, rather than bulk data center, constant usage conditions. It's just difficult to really believe some of these kinds of hardware deprecation numbers since there appears to be so little info other than guesstimates.

[2] https://www.tweaktown.com/news/93052/heres-look-at-gpu-failu...

Found an Arxiv paper on continued checking that's from UIUC UrbanaIL about a 1,056 A100 and H100 GPU system. [3] However, the paper is primarily about memory issues and per/job downtime that causes task failures and work loss. GPU Resilience is discussed, it's just mostly from the perspective of short-term robustness in the face of propagating memory corruption issues and error correction, rather than multi-year, 100% usage GPU burnout rates.

[3] https://arxiv.org/html/2503.11901v3

Any info on the longer term burnout / failure rates for GPUs similar to Backblaze?

Edit: This article [4] claims it's 0.1-2% failure rate per year (0.8% (estimated)) with no real info about where the data came from (cites "industry reports and data center statistics"), and then claims they often last 3-5 years on average.

[4] https://cyfuture.cloud/kb/gpu/what-is-the-failure-rate-of-th...

Given power and price constraints, it's not that you cannot run them in 5 years time it's that you don't want to run them in 5 years time and neither will anyone else that doesn't have free power.

> AI accelerator cards all start dying around the 5 year mark,

More likely the technology will be much better in 5 years in terms of hardware that it is (very) uneconomical to run anything on old hardware.

Actually my biggest issue here is that, assuming it hasnt paid off, you dont just convert to regular data center usage.

Honestly if we see a massive drop in DC costs because the AI bubble bursts I will be stoked.