> this would be the first time that a high core count CCD will have the ability to support a V-Cache die. If AMD sticks to the same ratio of base die cache to V-Cache die cache, then each 32 core CCD would have up to 384MB of L3 cache which equates to 3 Gigabytes of L3 cache across the chip.
Core Complex Die - an AMD term for a chiplet that contains the CPU cores and cache. It connects to an IOD (I/O die) that does memory, PCIe etc (≈southbridge?).
To further expand on this, "southbridge" is what we now call a chipset expander (or 50 other company or product line specific names).
Its a switch that has a bunch of unified PHYs that can do many different tasks (non-primary PCI-E lanes, SATA ports, USB ports, etc), leveraging shared hardware to reduce silicon footprint while increasing utility, and connects to PCI-E lanes on the CPU.
Does it actually scale well to that many cores? If so, that's quite impressive; most video game simulations of that kind benefits more from few fast cores since parallelizing simulations well is difficult
Someone needs to try running Crysis on that bad boy using the D3D WARP software rasterizer. No GPU, just an army of CPU cores trying their best. For science.
I wonder what Ampere (mentioned in that article) is going to do. At this rate they’ll need to release a 1000 cpu chip just to be noticeably “different.”
256 Zen 6c Core. I cant wait for cloud vendors to get their hands on it. In a Dual Socket config that is 512 Core and 1024 vCPU per server node. We could get two node in a server, That is 1024 Core with 2048 threads.
Even the slowest of All programming languages or framework with 1 request per second per vCPU, that is 2K Request per second.
While the power draw might be high in absolute terms, the surface area is also quite large. For example, the article's estimates add up to just 2000mm2 for the Epyc chip. For reference, a Ryzen 9950X (AMD's hottest desktop CPU) has a surface area of about 262mm2, and a PPT (maximum power draw) of ~230W. This means that the max heat flux at the chip interface will almost certainly be lower on the Epyc chip than on the Ryzen - I don't think we're going to be getting 1000W+ PPT/TDP chips.
From that you can infer that there shouldn't be the need for liquid cooling in terms of getting the heat off the chip.
There still are overall system power dissipation problems, which might lead you to want to use liquid cooling, but not necessarily.
You can move a lot of air with good efficiency even just by using bigger fans that don't need to spin as fast most of the time. Water cooling is a good default for power-dense workloads, but far from an absolute necessity in every case.
Air almost certainly. They always develop these chips within a thermal envelop. The envelop should be within what air cooling can do.
PS. Having many cores doesn’t mean a lot more power. Multi core performance can be made very efficient by having many cores running at lower clock rate.
Perhaps the most comparable 1990s system would be the SGI Origin 2800 (https://en.wikipedia.org/wiki/SGI_Origin_2000) with 128 processors in a single shared-memory multiprocessing system. The full system took up nine racks. The successor SGI Origin 3800 was available with up to 512 processors in 2002.
Each core is multiples faster than a 90's CPU for various reasons as well. I think if you look at an entire rack it's easily a multiple of a 90's datacenter.
> this would be the first time that a high core count CCD will have the ability to support a V-Cache die. If AMD sticks to the same ratio of base die cache to V-Cache die cache, then each 32 core CCD would have up to 384MB of L3 cache which equates to 3 Gigabytes of L3 cache across the chip.
Good lord!
> CCD
Core Complex Die - an AMD term for a chiplet that contains the CPU cores and cache. It connects to an IOD (I/O die) that does memory, PCIe etc (≈southbridge?).
Aside: CCX is Core Complex - see Figure 1 of https://www.amd.com/content/dam/amd/en/documents/products/ep...
For any other older fogeys that CCD means something different.
> memory, PCIe etc (≈southbridge?)
northbridge
To further expand on this, "southbridge" is what we now call a chipset expander (or 50 other company or product line specific names).
Its a switch that has a bunch of unified PHYs that can do many different tasks (non-primary PCI-E lanes, SATA ports, USB ports, etc), leveraging shared hardware to reduce silicon footprint while increasing utility, and connects to PCI-E lanes on the CPU.
2 replies →
256 cores on a die. Stunning.
32 cores on a die, 256 on a package. Still stunning though
How do people use these things? Map MPI ranks to dies, instead of compute nodes?
9 replies →
640 cores should be enough for anyone
Tell that to Nvidia, Blackwell is already up to 752 cores (each with 32-lane SIMD).
5 replies →
That's going to run Cities Skylines 2 ~~really really well~~ as well as it can be run.
Does it actually scale well to that many cores? If so, that's quite impressive; most video game simulations of that kind benefits more from few fast cores since parallelizing simulations well is difficult
9 replies →
Nope, see https://m.youtube.com/watch?v=44KP0vp2Wvg . Just didn't scale enough
1 reply →
Intel's Clearwater Forest could be shipping even sooner, 288 cores. https://chipsandcheese.com/p/intels-clearwater-forest-e-core...
It's a smaller denser core but still incredibly incredibly promising and so so neat.
Someone needs to try running Crysis on that bad boy using the D3D WARP software rasterizer. No GPU, just an army of CPU cores trying their best. For science.
1 reply →
I wonder what Ampere (mentioned in that article) is going to do. At this rate they’ll need to release a 1000 cpu chip just to be noticeably “different.”
2 replies →
"E-cores" are not the same
17 replies →
Ah, I omitted to mention that with 256 cores, you get 512 threads.
256 Zen 6c Core. I cant wait for cloud vendors to get their hands on it. In a Dual Socket config that is 512 Core and 1024 vCPU per server node. We could get two node in a server, That is 1024 Core with 2048 threads.
Even the slowest of All programming languages or framework with 1 request per second per vCPU, that is 2K Request per second.
Pure brute force hardware scaling.
I'd just like to take a moment to appreciate chipsandcheese and how they fill the Anandtech-shaped void in my heart <3
random internet feedback:
i really wish the article would have spent 2 sec to write in parenthesis what 'ccd' is (its 'Core Complex Die' fyi)
This is a hardcore chip website. All their readers know what it is.
If their goal was to appeal to more casual readers, then I agree.
Well, it could also mean CCD (Charge Coupled Device) which is also used in this field (or was?)
2 replies →
How is this sort of package cooled? Seems like you'd pretty much need to do some sort of water cooling right?
While the power draw might be high in absolute terms, the surface area is also quite large. For example, the article's estimates add up to just 2000mm2 for the Epyc chip. For reference, a Ryzen 9950X (AMD's hottest desktop CPU) has a surface area of about 262mm2, and a PPT (maximum power draw) of ~230W. This means that the max heat flux at the chip interface will almost certainly be lower on the Epyc chip than on the Ryzen - I don't think we're going to be getting 1000W+ PPT/TDP chips.
From that you can infer that there shouldn't be the need for liquid cooling in terms of getting the heat off the chip.
There still are overall system power dissipation problems, which might lead you to want to use liquid cooling, but not necessarily.
For example, Super Micro will sell you air cooled 1U servers that options up to 400W CPU options (https://www.supermicro.com/en/products/system/hyper/1u/as%20...)
You can move a lot of air with good efficiency even just by using bigger fans that don't need to spin as fast most of the time. Water cooling is a good default for power-dense workloads, but far from an absolute necessity in every case.
You can cool it however you want but the better the cooling the better the performance. We'll probably see heat pipes at a minimum.
Air almost certainly. They always develop these chips within a thermal envelop. The envelop should be within what air cooling can do.
PS. Having many cores doesn’t mean a lot more power. Multi core performance can be made very efficient by having many cores running at lower clock rate.
256c/512t off a single package… likely 1024 threads in a 2cpu system.
Basically we are about to reach the scale where a single rack of these is a whole datacenter from the nineties or something like that
Perhaps the most comparable 1990s system would be the SGI Origin 2800 (https://en.wikipedia.org/wiki/SGI_Origin_2000) with 128 processors in a single shared-memory multiprocessing system. The full system took up nine racks. The successor SGI Origin 3800 was available with up to 512 processors in 2002.
Each core is multiples faster than a 90's CPU for various reasons as well. I think if you look at an entire rack it's easily a multiple of a 90's datacenter.
The new double wide rack looks good
AMD Venice? 2005 is calling!
x86_64 server architecture 256 cores on a die.
Blackwell 100+200 compression spin lock documentation.
Have not checked for a while, but does AMD at this point have any software to run stable and efficiently?
Or are they still building chips no one wants to use because cuda is the only thing that doesn’t suck balls
ROCm is pretty stable now.