← Back to context

Comment by menaerus

2 years ago

This all comes down to the cache coherency protocols. And it's not surprising that you see increased latency with zen4 microarchitecture because, as one of the parent commenters already said, it's almost as if you're running a NUMA architecture within a single physical chip.

When dealing with NUMA we know that cross-socket, or in this case cross-CCD, latencies are always higher than the ones within the same socket or same CCD. Usually multi-fold.

This article nicely lists the core-to-core latencies between the Intel and AMD microarchitectures: https://chipsandcheese.com/2023/07/17/genoa-x-server-v-cache...

So, if you're able to somehow take advantage of this knowledge in your code (e.g. by scheduling less latency-sensitive tasks to the other CCD), you may be able to improve your overall performance.

It makes Ryzen 9 processers great for budget VFIO workstations because you can run an OS (either host and guest or two guests) on each CCD.