← Back to context

Comment by rubyn00bie

5 hours ago

I’ve not kept up with Intel in a while, but one thing that stood out to me is these are all E cores— meaning no hyperthreading. Is something like this competitive, or preferred, in certain applications? Also does anyone know if there have been any benchmarks against AMDs 192 core Epyc CPU?

In HPC, like physics simulation, they are preferred. There's almost no benefit of HT. What's also preferred is high cluck frequencies. These high core count CPUs nerd their clixk frequencies though.

  • I'm so sorry for being juvenile but "high cluck frequencies" may be my favorite typo of all time.

Without the hyperthreading (E-cores) you get more consistent performance between running tasks, and cloud providers like this because they sell "vCPUs" that should not fluctuate when someone else starts a heavy workload.

  • Sort of. They can just sell even numbers of vCPUs, and dedicate each hyper-thread pair to the same tenant. That prevents another tenant from creating hyper-threading contention for you.

"Is something like this competitive, or preferred, in certain applications?"

They cite a very specific use case in the linked story: Virtualized RAN. This is using COTS hardware and software for the control plane for a 5G+ cell network operation. A large number of fast, low power cores would indeed suit such a application, where large numbers of network nodes are coordinated in near real time.

It's entirely possible that this is the key use case for this device: 5G networks are huge money makers and integrators will pay full retail for bulk quantities of such devices fresh out of the foundry.

  • is RAM a concern in these cluster applications, cause if prices stay up, how do you get them off the shelf if you also need TB of memory.

    • > how do you get them off the shelf if you also need TB of memory

      You make products for well capitalized wireless operators that can afford the prevailing cost of the hardware they need. For these operations, the increase in RAM prices is not a major factor in their plans: it's a marginal cost increase on some of the COTS components necessary for their wireless system. The specialized hardware they acquire in bulk is at least an order of magnitude more expensive than server RAM.

      Intel will sell every one of these CPUs and the CPUs will end up in dual CPU SMP systems fully populated with 1-2 TB of DDR5-8000 (2-4GB/core, at least) as fast as they can make them.

E core vs P core is an internal power struggle between two design teams that looks on the surface like ARM’s big.LITTLE approach

  • E cores ruined P cores by forcing the removal of AVX-512 from consumer P cores

    Which is why I used AMD in my last desktop computer build

    • E cores didn't just ruin P cores, it ruined AVX-512 altogether. We were getting so close to near-universal AVX-512 support; enough to bother actually writing AVX-512 versions of things. Then, Intel killed it.

    • That's finally set to be resolved with Nova Lake later this year, which will support AVX10 (the new iteration of AVX512) across both core types. Better very late than never.

    • I love the AVX512 support in Zen 5 but the lack of Valgrind support for many of the AVX512 instructions frustrates me almost daily. I have to maintain a separate environment for compiling and testing because of it.

For an application like a build server, the only metric that really matters is total integer compute per dollar and per watt. When I compile e.g a Yocto project, I don't care whether a single core compiles a single C file in a millisecond or a minute; I care how fast the whole machine compiles what's probably hundred thousands of source files. If E-cores gives me more compute per dollar and watt than P-cores, give me E-cores.

Of course, having fewer faster cores does have the benefit that you require less RAM... Not a big deal before, you could get 512GB or 1TB of RAM fairly cheap, but these days it might actually matter? But then at the same time, if two E-cores are more powerful than one hyperthreaded P-core, maybe you actually save RAM by using E-cores? Hyperthreading is, after all, only a benefit if you spawn one compiler process per CPU thread rather than per core.

EDIT: Why in the world would someone downvote this perspective? I'm not even mad, just confused

  • Yocto's for embedded projects though, right?

    I imagine that means less C++/Rust than most, which means much less time spent serialized on the linker / cross compilation unit optimizer.

    • It's for building embedded Linux distros, and your typical Linux distro contains quite a lot of C++ and Rust code these days (especially if you include, say, a browser, or Qt). But you have parallelism across packages, so even if one core is busy doing a serial linking step, the rest of your cores are busy compiling other packages (or maybe even linking other packages).

      That said, there are sequential steps in Yocto builds too, notably installing packages into the rootfs (it uses dpkg, opkg or rpm, all of which are sequential) and any code you have in the rootfs postprocessing step. These steps usually aren't a significant part of a clean build, but can be a quite substantial part of incremental builds.

It all depends on your exact workload, and I’ll wait to see benchmarks before making any confident claims, but in general if you have two threads of execution which are fine on an E-core, it’s better to actually put them on two E-cores than one hyperthreaded P-core.

I think some of why is size on die. 288 E cores vs 72 P cores.

Also, there's so many hyperthreading vulnerabilities as of late they've disabled on hyperthreaded data center boards that I'd imagine this de-risks that entirely.

I don't know the nitty-gritty of why, but some compute intensive tasks don't benefit from hyperthreading. If the processor is destined for those tasks, you may as well use that silicon for something actually useful.

https://www.comsol.com/support/knowledgebase/1096

  • It's a few things; mostly along the lines of data caching (i.e. hyper threading may mean that other thread needs a cache sync/barrier/etc).

    That said I'll point to the Intel Atom - the first version and refresh were an 'in-order' where hyper-threading was the cheapest option (both silicon and power-wise) to provide performance, however with Silvermont they switched to OOO execution but ditched hyper threading.

  • Yeah of you are running Comsol you need real cores + high clock frequency + high memory bandwidth.

    Gaming CPUs and some EPYCs are the best

I guess it competes with the like of Ampere's ARM servers? I'm sure there are use cases for lots and lots of weak cores, in telecom especially.

It's a trade off. Hyperthreading takes up space on the die and the power budget.

As to E core itself - it's ARM's playbook.