← Back to context

Comment by nathants

2 years ago

afaik not for my use case. i need low latency with no variability. you only get that staying within a single cpu cluster.

This all comes down to the cache coherency protocols. And it's not surprising that you see increased latency with zen4 microarchitecture because, as one of the parent commenters already said, it's almost as if you're running a NUMA architecture within a single physical chip.

When dealing with NUMA we know that cross-socket, or in this case cross-CCD, latencies are always higher than the ones within the same socket or same CCD. Usually multi-fold.

This article nicely lists the core-to-core latencies between the Intel and AMD microarchitectures: https://chipsandcheese.com/2023/07/17/genoa-x-server-v-cache...

So, if you're able to somehow take advantage of this knowledge in your code (e.g. by scheduling less latency-sensitive tasks to the other CCD), you may be able to improve your overall performance.

  • It makes Ryzen 9 processers great for budget VFIO workstations because you can run an OS (either host and guest or two guests) on each CCD.

It's not possible for every workload, and sometimes the necessary effort makes it infeasible. Getting a Xeon might be the cheaper option then ;-)

Edit: Though I'd still recommend heeding menaerus excellent sibling answer. Maybe not for this project, but it is great knowledge to have in your domain and I'd expect it to be relevant for the future.