← Back to context

Comment by nathants

2 years ago

yeah? a xeon with 16 cores might have to be my next pc. pricey though.

5 comments

nathants

Reply

archi42 2 years ago

You can try to make your code aware of the situation and distribute the tasks accordingly. IIRC that's what people do on NUMA systems.

nathants 2 years ago
afaik not for my use case. i need low latency with no variability. you only get that staying within a single cpu cluster.
- menaerus 2 years ago
  
  This all comes down to the cache coherency protocols. And it's not surprising that you see increased latency with zen4 microarchitecture because, as one of the parent commenters already said, it's almost as if you're running a NUMA architecture within a single physical chip.
  When dealing with NUMA we know that cross-socket, or in this case cross-CCD, latencies are always higher than the ones within the same socket or same CCD. Usually multi-fold.
  This article nicely lists the core-to-core latencies between the Intel and AMD microarchitectures: https://chipsandcheese.com/2023/07/17/genoa-x-server-v-cache...
  So, if you're able to somehow take advantage of this knowledge in your code (e.g. by scheduling less latency-sensitive tasks to the other CCD), you may be able to improve your overall performance.
  
  1 reply →
- archi42 2 years ago
  
  It's not possible for every workload, and sometimes the necessary effort makes it infeasible. Getting a Xeon might be the cheaper option then ;-)
  Edit: Though I'd still recommend heeding menaerus excellent sibling answer. Maybe not for this project, but it is great knowledge to have in your domain and I'd expect it to be relevant for the future.