Comment by mrlonglong

1 month ago

256 cores on a die. Stunning.

Intel's Clearwater Forest could be shipping even sooner, 288 cores. https://chipsandcheese.com/p/intels-clearwater-forest-e-core...

It's a smaller denser core but still incredibly incredibly promising and so so neat.

  • Someone needs to try running Crysis on that bad boy using the D3D WARP software rasterizer. No GPU, just an army of CPU cores trying their best. For science.

    • This has already been tried :)

      iirc, in the 2016 a quadcore intel cpu ran the original crysis at ~15fps

  • I wonder what Ampere (mentioned in that article) is going to do. At this rate they’ll need to release a 1000 cpu chip just to be noticeably “different.”

    • At some point won't the bandwidth requirements exceed the number of pins you can fit within the available package area? Presumably you'll end up back at a low maximum memory high bandwidth GPU design.

      I wonder how many of these you could cram into 1U? And what the maximum next gen kW/U figure looks like.

    • Unfortunately Ampere has fallen pretty far behind AMD. I don't see much point to their recent CPUs.

  • "E-cores" are not the same

    • The 32 core / die AMD products are almost certainly Zen 6c, which is the same "idea" as Intel E-Cores albeit way less crappy.

      https://www.techpowerup.com/forums/threads/amd-zen-6-epyc-ve...

      EDIT: actually, now that I think about it some more, my characterization of Zen-C cores as the same "idea" as Intel E-cores was pretty unfair too; they do serve the same market idea but the implementation is so much less silly that it's a bit daft to compare them. Intel E-Cores have different IPC, different tuning characteristics, and different feature support (ie, they are usually a different uarch) which makes them really annoying to deal with. Zen C cores are usually the same cores with less cache and sometimes fewer or narrower ports depending on the specific configuration.

      6 replies →

32 cores on a die, 256 on a package. Still stunning though

  • How do people use these things? Map MPI ranks to dies, instead of compute nodes?

    • Yeah, there's an option to configure one NUMA node per CCD that can speed up some apps.

    • Gemma.cpp has nested thread pools, one per chiplet, and one across all chiplets. With such core counts it is quite important to minimize any kind of sharing, even RMW atomics.

That's going to run Cities Skylines 2 ~~really really well~~ as well as it can be run.