Probably fun for those who already bought DDR5 memory... still kicking myself for not just pulling the trigger on that 128GB dual stick kit I looked at for $600 back in September. Now it's listed at $4k...
Meanwhile I hope my AM4 will chug along a few more years.
If you don't need 128GB, there are quality 64GB kits for under $700 on Newegg right now, which is cheaper than this CPU.
If someone needs to build something now and can wait to upgrade RAM in a year or two, 32GB kits are in the $370 range.
I don't like this RAM price spike either, but in the context of building a high-end system with a 16-core flagship CPU like this and probably an expensive GPU, it's still reasonable to build a system. If you must have 128GB of RAM it can be done with bundles like the one I linked above but I'd recommend waiting at least 6 months if you can. There are signs that prices are falling now that panic-buying has started to trail off.
128GB of RAM should not cost $4K even in this market.
No such bundle deals where I am. Absolute cheapest DDR5 128GB kit around is 2 sticks of 5600 64GB for $2k.
Cheapest 64GB kit is $930.
The kit I was oh-so-close to buying was two 6400 64GB sticks.
Not gonna buy now, not that desperate. I have a spare AM4 board, DDR4 memory and heck even CPU, I'll ride this one out. Likely skip AM5 entirely if something doesn't drastically change.
I really want a x3d because a game I play is heavily single threaded, I have the income and the financial stability but I can't in any good conscious upgrade to am5 with the ram prices. It's insane
AMD had an upgrade path with the 5700x3d, assuming you’re on AM4.
Just reading now that they went out of production half a year ago which is a shame. I was very impressed being able to upgrade with the same motherboard 6 years down the line.
I was waiting too, but the one game I play often that requires FPS performance decided to ruin their game with poor development direction. Now, I'm planning to buy for local llm hosting.
Here's hoping to more developments like TurboQuant to improve LLM memory efficiency.
I can't imagine it's looking good in the consumer space, but server space seems to be lit[1]:
Su said that typically, the first quarter (Q1) is slower due to seasonal patterns, but AMD has seen its data center business expand from Q4 into Q1, demonstrating ongoing strength across both CPUs and GPUs. This growth underscores the company’s ability to capitalize on rising demand for AI compute and enterprise workloads, even during traditionally quieter periods.
“We are going into a big inflection year here in 2026. The CPU business is absolutely on fire.”
PCPartPicker are also publishing charts showing the astronomic rise in DDR5 prices over time: https://pcpartpicker.com/trends/price/memory/. Those charts don't cover any kits with 64 GB sticks, but they're a good demonstration of the general scale.
> Probably fun for those who already bought DDR5 memory
Nah, those of us who already bought DDR5 memory also already bought decent CPUs. Dropping another $1k for these incremental gains would be silly. It'd make a lot more sense if DDR5 had been around longer so that people had the option to make generational upgrades to this CPU but DDR5 on AMD has only been around for Zen4 and Zen5.
I'm looking at building a new system, and was waiting to see what happens with this chip and Intel's Arc Pro B70 card. I can't find ECC UDIMMs of 64GB per-stick to make 128GB, but I can put together two solo UDIMMs of 32GB or 48GB for $800 and $1000 per stick respectively.
I really want to see what enabling the L3 cache options in the BIOS do from a NUMA standpoint. I have some projects I want to work on where being able to even just simulate NUMA subdivisions would be highly useful.
>Meanwhile I hope my AM4 will chug along a few more years.
I am fine with my 2 year old 128GB DDR4 for now. I will just upgrade the 14700K to 14900KS CPU and wait 2 more years.
Judging by the benchmarks newer CPUs aren't much better for multithreading workloads than 14900KS anyway, so it doesn't make a lot of sense to upgrade to newer CPUs, DDR5 and a new mobo.
After randomly breaking the AM4 CPU and motherboard in my 4 year old PC last year and seeing that at the time I'd spent almost a new PC to get new parts and rebuild it. Less if I wanted to do a complete rebuild myself but I'm over building PCs. I've done that for years.
It was an expensive mistake as I bought a few options to experiment including a NUC and an M4 Mac Mini but eventually bought a 9800X3D 5070Ti PC for <$2 and for no reason in particular I bought a 64GB DDR5-6000 kit for $200 in August or so. I checked recently and that kit is pushing $1000. I also bought a 4080 laptop and bought a 64GB kit and an extra SSD for it too last year.
That's pretty lucky given what's happened since. I don't claim any kind of foresight about what would happen.
I do kind of want to take the parts I have and build another AM4 PC. The 5900XT is not a bad option with 16 cores for ~$300 but my DDR4 RAM is almost useless because the best deals now are for combos of CPU + motherboard + RAM at steep discounts.
You can get some good deals on prebuilts still. Not as good as 6+ months ago but still not bad. Costco has a 5080 PC for $2300. There's no way I'm going overboard and building a 128GB+ PC right now.
I've seen multiple RAM spikes. We had one at the height of the crypto hysteria IIRC but this is significantly worse and is also impacting SSDs. I kinda wish I'd bought 1-2 4TB+ SSDs last year but oh well.
We're really waiting for the AI bubble to pop. Part of me think sthat'll be in the next year but it could stay irrational substantially longer than that.
The C30 64GB kits are nearly impossible to buy now, so, well done. Got one in September '23 for ~$380 AUD, on the rare occasions it's available today it's been over $1600 AUD.
I upgraded my UPS to a sine interactive unit to minimise the risk of it dying to bad power while the market is so crazy...
Oh man. I am running computations on my server that involve computing geodesic distances with the heat method. The job turns out to be a L3 cache thrasher, leaving my cpus underutilized for multi worker jobs .... 208mb instead of my 25 per socket sounds amazing
Back in 2004 my PC RAM was 256. My relative's laptop had 128. That's crazy when a modern CPU cache can theoretically host an OS (or even multiple OSes) from early 2000s.
The Power4 MCM had 128 MB cache in 2001. The G4 TiBook sold the same year came with 128 MB of system RAM base, and OS X supported 64 MB configurations for a few years after this.
The RAM prices are so high and the storage is also getting more expensive every day, so we're forced to fit everything inside the CPU cache as a solution! /s
Context: Early in the firmware boot process the memory controller isn't configured yet so the firmware uses the cache as RAM. In this mode cache lines are never evicted since there's no memory to evict them to.
In my case it began with 16K (yes, 161024 bytes) and 90K (yes, 901024 bytes) 5.25" floppy disks (although the floppies were a few months after the computer). Eventually upgraded to 48K RAM and 180K double density floppy disks. The computer: Atari 800.
I'll see your Atari 800 and raise you my Atari 2600 with its whopping 128 bytes of RAM. Bytes with a B. I can kinda sorta call it a computer because you could buy a BASIC cartridge for it (I didn't and stand by that decision - it was pretty bad).
Maybe in 50 years the cache of CPUs and GPUs will be 1TB. Enough to run multiple LLMs (a model entirely run for each task). Having robots like in the movies would need LLMs much much faster than what we see today.
KolibriOS would fit in there, even with the data in memory. You cannot load it into the cache directly, but when the cache capacity is larger than all the data you read there should be no cache eviction and the OS and all data should end up in the cache more or less entirely. In other words it should be really, really fast, which KolibriOS already is to begin with.
Unless you lay everything out continuously in memory, you’ll still get cache eviction due to associativty and depending on the eviction strategy of the CPU. But certainly DOS or even early Windows 95 could conceivably just run out of the cache
That assumes KolibriOS or any major component is pinned to one core and one cache slice instead of getting dragged between CCDs or losing memory affinity. Throw actual users, IO, and interrupts at it and you get traffic across chiplets, or at least across L3 groups, so the nice 'everything lives in cache' story falls apart fast.
Nice demo, bad model. The funny part is that an entire OS can fit in cache now, the hard part is making the rest of the system act like that matters.
The extra cache doesn't do a damn thing (maybe +2%)
The lower leakage currents at lower voltages allowed them to implement a far more aggressive clock curve from the factory. That's where the higher allcore clock comes from (+30W TDP)
I'm not complaining at all, I think this is an excellent way to leverage binning to sell leftover cache.
Though if I may complain, Ars used to actually write about such things in their articles instead of speculate in a way that suspiciously resembles what an AI would write.
> The extra cache doesn't do a damn thing (maybe +2%)
It depends on the task. For some memory-bound tasks the extra cache is very helpful. For CFD and other simulation workloads the benefits are huge.
For other tasks it doesn't help at all.
If someone wants a simple gaming CPU or general purpose CPU they don't need to spend the money for this. They don't need the 16-core CPU at all. The 9850X3D is a better buy for most users who aren't frequently doing a lot of highly parallel work
CFD benefits from cache, but it benefits even more from sustained memory bandwidth, no? A small(ish) chunk of L3 + two channels of DRAM is not going to compete with a quarter as much L3 plus eight channels of DRAM when typical working set sizes (in my experience) are in the tens of gigabytes, is it?
It really doesn't. In virtually every case the work is being completed faster than the cache can grow to that size. What little gains are being realized are from not having to wait for cores with access to the cache to become available.
> Here is the side-by-side of the Ryzen 9 9950X vs. 9950X3D for showing the areas where 3D V-Cache really is helpful:
Coincidentally, it looks they filtered to all benchmarks with differences greater than 2%. The biggest speedup is 58.1%, and that's just 3d vcache on half the chip.
I am so grateful that I bought my 128 GB ram kit in January of last year for my own 9950 upgrade. We just built my dad a 7000 series to replace his old AM4 (2017 build) and 32 gigs DDR five was nearly the same price at Micro Center that I paid last year. I was able to gift him an Nvidia 1060 discreet graphics card so that he could continue to run his two monitors. The newer motherboards have much less on board capability for that.
I upgraded to a 4070 super last year. I ran both cards at the same time for a little bit, but it got really frustrating to keep the wrong card from being assigned to a particular task with llama. I really should’ve taken an R&D tax credit on my AI research but I’m still able to expense it for the business.
I'm interested to know if the L3 cache all behaves as a single pool for any core on either CCD, whether there's a penalty in access time depending on locality or whether they are just entirely localised.
And that answer is good enough for most workloads. You should stop reading now.
_______________________
The complex answer is that there is some ability one CCD to pull cachelines from the other CCD. But I've never been able to find a solid answer for the limitations on this. I know it can pull a dirty cache line from the L1/L2 of another CCDs (this is the core-to-core latency test you often see in benchmarks, and there is an obvious cross-die latency hit).
But I'm not sure it can pull a clean cacheline from another CCD at all, or if those just get redirected to main memory (as the latency to main memory isn't that much higher than between CCDs). And even if it can pull a clean cacheline, I'm not sure it can pull them from another CCD's L3 (which is an eviction cache, so only holds clean cachelines).
The only way for a cacheline to get into a CCD's L3 is to be evicted from an L2 on that core, so if a dataset is active across both CCDs, it will end up duplicated across both L3s. Cachelines evicted from one L3 do NOT end up in another L3, so an idle CCD can't act as a pseudo L4.
I haven't seen anyone make a benchmark which would show the effect, if it exists.
AMD didn't have to introduce a special driver for the Ryzen 9 5950x to keep threads resident to the "gaming" CCD. There was only a small difference between the 5950x and the non-X3d Ryzen 7 5800x in workloads that didn't use more than 8 cores unlike the observed slowdowns in the Ryzen 9s 7950X3D and 7900X3D when they were released compared to the Ryzen 7 7800X3D .
When the L3 sizes are different across CCDs the special AMD driver is needed to keep threads pinned to the larger L3 CCD and prevent them from being placed on the small L3 CCD where their memory requests can exploit the other CCD's L3 as an L4. The AMD driver reduces CCD to CCD data requests by keeping programs contained in one CCD.
With equal L3 caches when a process spills onto the second CCD it will still use the first's L3 cache as "L4" but it no longer has to evict that data at the same rate as the lopsided models. Additionally the first CCD can use the second CCD's L3 in kind reducing the number of requests that need to go to main memory.
The same sized L3s reduce contention to the IO die and the larger sized L3s reduce memory contention, it's a win-win.
The gain is very workload dependent, so there are no generally-applicable rules.
There are many applications which need synchronization between threads, so the speed of the slowest thread has a disproportionate influence on the performance.
In such applications, on X3D2 the slowest thread has a 3 times bigger cache on an X3D2 vs. X3D. That can make a lot of difference.
So there will be applications with no difference in performance, but also applications with a very large difference in performance, equal to the best performance differences shown by X3D vs. plain 9950X.
It really comes down to how much more this CPU is over the next one down if you're building a new rid for a long period of time. I'm running on a 5950X which is coming up on it's 6 years in November. I could have spend a little less on the next model down, but I expect this rig will last me for a few more years (especially with how much memory is). The per year extra expense for that CPU was almost nothing over its lifetime.
Now, would I upgrade an existing computer with a slightly slower processor with it, probably not.
I know the prices of RAM are high, but 256GB RAM limit seems like omission. If they supported at least 512GB in quad or eight channel that would be something worth looking at for me. I know there is Threadripper but ECC memory is out of reach.
Given that the dies still have L3 on them does this count as L4 or does the hardware treat it as a single pool of L3?
Would be neat to have an additional cache layer of ~1 GB of HBM on the package but I guess there's no way that happens in the consumer space any time soon.
Per compute die it functions as one 96M L3 with uniform latency. It is 4 cycles more latency than the configuration with smaller 32M L3. But there are two compute dies, each with their own L3. And like the 9950X coherency between these two L3 is maintained over global memory interconnect to the third (IO) die.
But to do it literally - I'm not a low-level motherboard EE, but I'd bet you're looking at 5 to 7 figures (US $) of engineering work, to get around all the ways in which that would violate assumptions baked into the designs of the CPU, support chips, firmwares, etc.
Make a fake ram which offers write through guarantee and returns bus no matter what address is referenced. You could possibly short circuit any "is ram there" test if it just says yes for whatever size and stride got configured.
Makes sense. RAM pricing surely has lead to a fall of AM5 high-end CPU purchases, might as well try to get some extra cash from those who still buy. Bin the remaining now non-X3D chips as something else.
Probably fun for those who already bought DDR5 memory... still kicking myself for not just pulling the trigger on that 128GB dual stick kit I looked at for $600 back in September. Now it's listed at $4k...
Meanwhile I hope my AM4 will chug along a few more years.
> Now it's listed at $4k...
You can buy 128GB of DDR5-6000 with a 9950X3D (not this newest X2 version, but still a $699 CPU) and a motherboard and a case for $2800 right now: https://www.newegg.com/Product/ComboDealDetails?ItemList=Com...
If you don't need 128GB, there are quality 64GB kits for under $700 on Newegg right now, which is cheaper than this CPU.
If someone needs to build something now and can wait to upgrade RAM in a year or two, 32GB kits are in the $370 range.
I don't like this RAM price spike either, but in the context of building a high-end system with a 16-core flagship CPU like this and probably an expensive GPU, it's still reasonable to build a system. If you must have 128GB of RAM it can be done with bundles like the one I linked above but I'd recommend waiting at least 6 months if you can. There are signs that prices are falling now that panic-buying has started to trail off.
128GB of RAM should not cost $4K even in this market.
$2800 is still a huge price in comparison with the last year.
Last summer, a 9950X3D + motherboard + cooler + 128 GB DRAM + VAT sales taxes was the equivalent of $1400 in Europe, where I live.
That's half of your quoted price. That was without case and PSU, but adding e.g. $200 for those would not change much.
4 replies →
I bought 192GB (4x 48GB) of DDR5-6400 for 299 euro in September but returned it because I couldn't get 4 DIMMS to run at decent speeds in the system.
6 or so weeks after I returned it the kit was listed at 1499.
12 replies →
No such bundle deals where I am. Absolute cheapest DDR5 128GB kit around is 2 sticks of 5600 64GB for $2k.
Cheapest 64GB kit is $930.
The kit I was oh-so-close to buying was two 6400 64GB sticks.
Not gonna buy now, not that desperate. I have a spare AM4 board, DDR4 memory and heck even CPU, I'll ride this one out. Likely skip AM5 entirely if something doesn't drastically change.
4 replies →
that bs of you don't need 128 are toxic. what if you want to upgrade from ddr4 and you already have 128?
I really want a x3d because a game I play is heavily single threaded, I have the income and the financial stability but I can't in any good conscious upgrade to am5 with the ram prices. It's insane
Yep exactly the same situation.
I would not be surprised if we see casualties in adjacent markets, such as motherboards, coolers and whatnot.
AMD had an upgrade path with the 5700x3d, assuming you’re on AM4.
Just reading now that they went out of production half a year ago which is a shame. I was very impressed being able to upgrade with the same motherboard 6 years down the line.
2 replies →
I was waiting too, but the one game I play often that requires FPS performance decided to ruin their game with poor development direction. Now, I'm planning to buy for local llm hosting.
Here's hoping to more developments like TurboQuant to improve LLM memory efficiency.
What game, if you don't mind my asking?
1 reply →
Wonder how much sales amd and intel are losing because of tight DDR5 supply
I can't imagine it's looking good in the consumer space, but server space seems to be lit[1]:
Su said that typically, the first quarter (Q1) is slower due to seasonal patterns, but AMD has seen its data center business expand from Q4 into Q1, demonstrating ongoing strength across both CPUs and GPUs. This growth underscores the company’s ability to capitalize on rising demand for AI compute and enterprise workloads, even during traditionally quieter periods.
“We are going into a big inflection year here in 2026. The CPU business is absolutely on fire.”
[1]: https://stocktwits.com/news-articles/markets/equity/amd-ceo-...
None. Every component is seeing huge demand.
oh wow you weren't joking: https://pcpartpicker.com/products/memory/#xcx=0&b=ddr5&Z=131...
(cheapest at $1240 USD)
PCPartPicker are also publishing charts showing the astronomic rise in DDR5 prices over time: https://pcpartpicker.com/trends/price/memory/. Those charts don't cover any kits with 64 GB sticks, but they're a good demonstration of the general scale.
> Probably fun for those who already bought DDR5 memory
Nah, those of us who already bought DDR5 memory also already bought decent CPUs. Dropping another $1k for these incremental gains would be silly. It'd make a lot more sense if DDR5 had been around longer so that people had the option to make generational upgrades to this CPU but DDR5 on AMD has only been around for Zen4 and Zen5.
I am glad I decisively ordered 96GB (2x48) DDR5 ECC back in June, alongside the 9800x3d.
I hope this is still enough for the planned upgrade to Zen7 in 2028.
I'm looking at building a new system, and was waiting to see what happens with this chip and Intel's Arc Pro B70 card. I can't find ECC UDIMMs of 64GB per-stick to make 128GB, but I can put together two solo UDIMMs of 32GB or 48GB for $800 and $1000 per stick respectively.
I really want to see what enabling the L3 cache options in the BIOS do from a NUMA standpoint. I have some projects I want to work on where being able to even just simulate NUMA subdivisions would be highly useful.
1 reply →
You're basically me. I was mulling 48 vs 96, decided 200$ wasn't worth quibbling too much over and bought 96GB in August.
Feeling pretty chuffed now XD (though still sad because building a new PC is dumb when RAM costs more than a 24 core monster CPU)
1 reply →
Same... got 2x48 DDR5 for $304 back in February of 2025. Equivalent kits are going for $900-$1,100. Madness.
>Meanwhile I hope my AM4 will chug along a few more years.
I am fine with my 2 year old 128GB DDR4 for now. I will just upgrade the 14700K to 14900KS CPU and wait 2 more years.
Judging by the benchmarks newer CPUs aren't much better for multithreading workloads than 14900KS anyway, so it doesn't make a lot of sense to upgrade to newer CPUs, DDR5 and a new mobo.
After randomly breaking the AM4 CPU and motherboard in my 4 year old PC last year and seeing that at the time I'd spent almost a new PC to get new parts and rebuild it. Less if I wanted to do a complete rebuild myself but I'm over building PCs. I've done that for years.
It was an expensive mistake as I bought a few options to experiment including a NUC and an M4 Mac Mini but eventually bought a 9800X3D 5070Ti PC for <$2 and for no reason in particular I bought a 64GB DDR5-6000 kit for $200 in August or so. I checked recently and that kit is pushing $1000. I also bought a 4080 laptop and bought a 64GB kit and an extra SSD for it too last year.
That's pretty lucky given what's happened since. I don't claim any kind of foresight about what would happen.
I do kind of want to take the parts I have and build another AM4 PC. The 5900XT is not a bad option with 16 cores for ~$300 but my DDR4 RAM is almost useless because the best deals now are for combos of CPU + motherboard + RAM at steep discounts.
You can get some good deals on prebuilts still. Not as good as 6+ months ago but still not bad. Costco has a 5080 PC for $2300. There's no way I'm going overboard and building a 128GB+ PC right now.
I've seen multiple RAM spikes. We had one at the height of the crypto hysteria IIRC but this is significantly worse and is also impacting SSDs. I kinda wish I'd bought 1-2 4TB+ SSDs last year but oh well.
We're really waiting for the AI bubble to pop. Part of me think sthat'll be in the next year but it could stay irrational substantially longer than that.
The C30 64GB kits are nearly impossible to buy now, so, well done. Got one in September '23 for ~$380 AUD, on the rare occasions it's available today it's been over $1600 AUD.
I upgraded my UPS to a sine interactive unit to minimise the risk of it dying to bad power while the market is so crazy...
Oh man. I am running computations on my server that involve computing geodesic distances with the heat method. The job turns out to be a L3 cache thrasher, leaving my cpus underutilized for multi worker jobs .... 208mb instead of my 25 per socket sounds amazing
Back in 2004 my PC RAM was 256. My relative's laptop had 128. That's crazy when a modern CPU cache can theoretically host an OS (or even multiple OSes) from early 2000s.
The Power4 MCM had 128 MB cache in 2001. The G4 TiBook sold the same year came with 128 MB of system RAM base, and OS X supported 64 MB configurations for a few years after this.
The RAM prices are so high and the storage is also getting more expensive every day, so we're forced to fit everything inside the CPU cache as a solution! /s
It would be interesting if it allowed to use the cache as ram and could boot without any sticks on the motherboard.
2 replies →
Crazy to think that my first personal computer's entire storage (was 160MB IIRC?) could fit into the L3 of a single consumer CPU!
It's probably not possible architecturally, but it would be amusing to see an entire early 90's OS running entirely in the CPU's cache.
https://github.com/coreboot/coreboot/blob/main/src/soc/intel...
Context: Early in the firmware boot process the memory controller isn't configured yet so the firmware uses the cache as RAM. In this mode cache lines are never evicted since there's no memory to evict them to.
3 replies →
In my case it began with 16K (yes, 161024 bytes) and 90K (yes, 901024 bytes) 5.25" floppy disks (although the floppies were a few months after the computer). Eventually upgraded to 48K RAM and 180K double density floppy disks. The computer: Atari 800.
I'll see your Atari 800 and raise you my Atari 2600 with its whopping 128 bytes of RAM. Bytes with a B. I can kinda sorta call it a computer because you could buy a BASIC cartridge for it (I didn't and stand by that decision - it was pretty bad).
2 replies →
My first PC had a 20MB HDD with 512Kb of RAM. So yeah that could fit into cache 10 times now.
Maybe in 50 years the cache of CPUs and GPUs will be 1TB. Enough to run multiple LLMs (a model entirely run for each task). Having robots like in the movies would need LLMs much much faster than what we see today.
doubtful that we will still have this computer architecture by then
KolibriOS would fit in there, even with the data in memory. You cannot load it into the cache directly, but when the cache capacity is larger than all the data you read there should be no cache eviction and the OS and all data should end up in the cache more or less entirely. In other words it should be really, really fast, which KolibriOS already is to begin with.
Unless you lay everything out continuously in memory, you’ll still get cache eviction due to associativty and depending on the eviction strategy of the CPU. But certainly DOS or even early Windows 95 could conceivably just run out of the cache
3 replies →
That assumes KolibriOS or any major component is pinned to one core and one cache slice instead of getting dragged between CCDs or losing memory affinity. Throw actual users, IO, and interrupts at it and you get traffic across chiplets, or at least across L3 groups, so the nice 'everything lives in cache' story falls apart fast.
Nice demo, bad model. The funny part is that an entire OS can fit in cache now, the hard part is making the rest of the system act like that matters.
You had ~160,000 times more storage than I did for my first personal computer.
Commodore PET for me - 8 KB of RAM and all the data you could store and read back from a TDK 120 cassette tape . . .
* https://en.wikipedia.org/wiki/Commodore_PET
Same time as the Trash-80 and BBC micro were making inroads.
IIRC some relatively strange CPUs could run with unbacked cache.
Intel's platform, at the very least, use cache-as-ram during the boot phase before the DDR interface can be trained and started up. https://github.com/coreboot/coreboot/blob/main/src/soc/intel...
> it would be amusing to see an entire early 90's OS running entirely in the CPU's cache.
There’s actually already two running (MINIX and UEFI), and it’s the opposite OS amusing - https://www.zdnet.com/article/minix-intels-hidden-in-chip-op...
My first pc had 40MB hrs and 8MB ram :D
I wonder how much faster dos would boot, especially with floppy seek times...
Instantly.
If you run a VM on a CPU like this, using a baremetal hypervisor, you can get very close to "everything in cache".
You can get close with a VM, but there's overhead in device emulation that slows things down.
Consider a VM where that kind of stuff has been removed, like the firecracker hypervisor used for AWS Lambda. You're talking milliseconds.
640K ought to be enough for anybody.
My first computer whole RAM could fit in L1 of a single core (128k)
The extra cache doesn't do a damn thing (maybe +2%)
The lower leakage currents at lower voltages allowed them to implement a far more aggressive clock curve from the factory. That's where the higher allcore clock comes from (+30W TDP)
I'm not complaining at all, I think this is an excellent way to leverage binning to sell leftover cache.
Though if I may complain, Ars used to actually write about such things in their articles instead of speculate in a way that suspiciously resembles what an AI would write.
> The extra cache doesn't do a damn thing (maybe +2%)
It depends on the task. For some memory-bound tasks the extra cache is very helpful. For CFD and other simulation workloads the benefits are huge.
For other tasks it doesn't help at all.
If someone wants a simple gaming CPU or general purpose CPU they don't need to spend the money for this. They don't need the 16-core CPU at all. The 9850X3D is a better buy for most users who aren't frequently doing a lot of highly parallel work
CFD benefits from cache, but it benefits even more from sustained memory bandwidth, no? A small(ish) chunk of L3 + two channels of DRAM is not going to compete with a quarter as much L3 plus eight channels of DRAM when typical working set sizes (in my experience) are in the tens of gigabytes, is it?
Sorry, what is "CFD" in this context?
1 reply →
But consumer product does not support SDCI (only Epyc Turin supports it), so it does not benefit too much if an accelerator is involved.
2 replies →
It really doesn't. In virtually every case the work is being completed faster than the cache can grow to that size. What little gains are being realized are from not having to wait for cores with access to the cache to become available.
2 replies →
It's very workload dependent. It certainly does more than 2% on many workloads.
See https://www.phoronix.com/review/amd-ryzen-9-9950x3d-linux/10
> Here is the side-by-side of the Ryzen 9 9950X vs. 9950X3D for showing the areas where 3D V-Cache really is helpful:
Coincidentally, it looks they filtered to all benchmarks with differences greater than 2%. The biggest speedup is 58.1%, and that's just 3d vcache on half the chip.
I think GP was saying that the additional 3D cache on this chip compared to the standard x3d isn’t going to do much.
I’m curious to see whether the same benchmarks benefit again so greatly.
4 replies →
I am so grateful that I bought my 128 GB ram kit in January of last year for my own 9950 upgrade. We just built my dad a 7000 series to replace his old AM4 (2017 build) and 32 gigs DDR five was nearly the same price at Micro Center that I paid last year. I was able to gift him an Nvidia 1060 discreet graphics card so that he could continue to run his two monitors. The newer motherboards have much less on board capability for that.
1060 is a sweet card for multi monitor. good on you for gifting him.
I upgraded to a 4070 super last year. I ran both cards at the same time for a little bit, but it got really frustrating to keep the wrong card from being assigned to a particular task with llama. I really should’ve taken an R&D tax credit on my AI research but I’m still able to expense it for the business.
Breakdown of the (semi-clickbait) 208MB cache: 16MB L2 (8MB per die?) + 32MB L3 * 2 dies + 64MB L3 Stacked 3D V-cache * 2
For comparison, 9950X3D have a total cache of 144MB.
> 16MB L2 (8MB per die?)
It is indeed 8MB per compute die but really 1MB per core. Not shared among the entire CCD.
I wouldn’t be caught dead with less than 200MB of cache in my desktop in 2026.
I'm interested to know if the L3 cache all behaves as a single pool for any core on either CCD, whether there's a penalty in access time depending on locality or whether they are just entirely localised.
The short answer is that L3 is local to each CCD.
And that answer is good enough for most workloads. You should stop reading now.
_______________________
The complex answer is that there is some ability one CCD to pull cachelines from the other CCD. But I've never been able to find a solid answer for the limitations on this. I know it can pull a dirty cache line from the L1/L2 of another CCDs (this is the core-to-core latency test you often see in benchmarks, and there is an obvious cross-die latency hit).
But I'm not sure it can pull a clean cacheline from another CCD at all, or if those just get redirected to main memory (as the latency to main memory isn't that much higher than between CCDs). And even if it can pull a clean cacheline, I'm not sure it can pull them from another CCD's L3 (which is an eviction cache, so only holds clean cachelines).
The only way for a cacheline to get into a CCD's L3 is to be evicted from an L2 on that core, so if a dataset is active across both CCDs, it will end up duplicated across both L3s. Cachelines evicted from one L3 do NOT end up in another L3, so an idle CCD can't act as a pseudo L4.
I haven't seen anyone make a benchmark which would show the effect, if it exists.
AMD didn't have to introduce a special driver for the Ryzen 9 5950x to keep threads resident to the "gaming" CCD. There was only a small difference between the 5950x and the non-X3d Ryzen 7 5800x in workloads that didn't use more than 8 cores unlike the observed slowdowns in the Ryzen 9s 7950X3D and 7900X3D when they were released compared to the Ryzen 7 7800X3D .
When the L3 sizes are different across CCDs the special AMD driver is needed to keep threads pinned to the larger L3 CCD and prevent them from being placed on the small L3 CCD where their memory requests can exploit the other CCD's L3 as an L4. The AMD driver reduces CCD to CCD data requests by keeping programs contained in one CCD.
With equal L3 caches when a process spills onto the second CCD it will still use the first's L3 cache as "L4" but it no longer has to evict that data at the same rate as the lopsided models. Additionally the first CCD can use the second CCD's L3 in kind reducing the number of requests that need to go to main memory.
The same sized L3s reduce contention to the IO die and the larger sized L3s reduce memory contention, it's a win-win.
https://www.phoronix.com/review/amd-3d-vcache-optimizer-9950...
Whenever I see a chip like this, I think "why wont my company let me use a decent computer"
so you're telling me I can (theoretically) have a full Alpine Linux installation in just the CPU? I'm impressed
9950X3D2? AMD, who is making you name your products like this? At some point just give up and name the chip a UUID already.
I actually don't mind this one, 9950 is the actual chip, x3d is the cache (where it's larger) and the 2 stands for it being on both chiplets.
Like your UUID joke but agree with sibling comment that 9950X3D2 is actually a good name.
can't agree. this name has logical meaning
Can someone explain if the 3D Vcache are stacked on top of each other or side by side.
If they are stacked then why not 9800X3D2?
The 99xx chips have two CPU dies, and one cache die is on each CPU die.
The 3D V-Cache sits underneath only one of the CCDs. See https://en.wikipedia.org/wiki/Ryzen#Ryzen_9000.
3 replies →
I don't really see a huge reason to buy this other than it being a top-tier halo product.
For gaming, AMD already pins the game threads to the CCD with the extra cache pretty well.
For multi-threaded workloads the gain from having cache on both CCDs is quite small.
The gain is very workload dependent, so there are no generally-applicable rules.
There are many applications which need synchronization between threads, so the speed of the slowest thread has a disproportionate influence on the performance.
In such applications, on X3D2 the slowest thread has a 3 times bigger cache on an X3D2 vs. X3D. That can make a lot of difference.
So there will be applications with no difference in performance, but also applications with a very large difference in performance, equal to the best performance differences shown by X3D vs. plain 9950X.
It really comes down to how much more this CPU is over the next one down if you're building a new rid for a long period of time. I'm running on a 5950X which is coming up on it's 6 years in November. I could have spend a little less on the next model down, but I expect this rig will last me for a few more years (especially with how much memory is). The per year extra expense for that CPU was almost nothing over its lifetime.
Now, would I upgrade an existing computer with a slightly slower processor with it, probably not.
I know the prices of RAM are high, but 256GB RAM limit seems like omission. If they supported at least 512GB in quad or eight channel that would be something worth looking at for me. I know there is Threadripper but ECC memory is out of reach.
Given that the dies still have L3 on them does this count as L4 or does the hardware treat it as a single pool of L3?
Would be neat to have an additional cache layer of ~1 GB of HBM on the package but I guess there's no way that happens in the consumer space any time soon.
Per compute die it functions as one 96M L3 with uniform latency. It is 4 cycles more latency than the configuration with smaller 32M L3. But there are two compute dies, each with their own L3. And like the 9950X coherency between these two L3 is maintained over global memory interconnect to the third (IO) die.
Can someone like... boot Windows 98 on these on a system with no ram?!
Theoretically anything is possible with enough thought and work.
Conceptually - yes, easily.
But to do it literally - I'm not a low-level motherboard EE, but I'd bet you're looking at 5 to 7 figures (US $) of engineering work, to get around all the ways in which that would violate assumptions baked into the designs of the CPU, support chips, firmwares, etc.
Make a fake ram which offers write through guarantee and returns bus no matter what address is referenced. You could possibly short circuit any "is ram there" test if it just says yes for whatever size and stride got configured.
that is larger than the HDD of my first PC.
My first computer had 64KB of RAM. My first PC had 8MB of RAM.
Factorio mega basing just found a new ceiling.
I'm curious to see if that is true. The maximum amount of cache addressable per core didn't increase after all.
I have a gigabyte of cache on my 9684x at home!
With the best silicon tech, in R&D, what would be the maxium static RAM(L1 cache) you could really slap to a 8 core CPU? (Zero DRAM).
It's disappointing that they had this for years but didn't release it until now.
I think it’s mostly that they had leftover cache.
This video made the argument that AMD released it to not give Intel a look-in: [AMD KILLED Intel's 290K Dreams w/ R9 9950X3D2](https://www.youtube.com/watch?v=u7SyrDPbKls)
1 reply →
Makes sense. RAM pricing surely has lead to a fall of AM5 high-end CPU purchases, might as well try to get some extra cash from those who still buy. Bin the remaining now non-X3D chips as something else.
1 reply →
[dead]
[dead]
[flagged]