Comment by menaerus
1 year ago
To saturate the bandwidth, you would need ~16 zen4 cores but you could first try running
lkwid -t load -i 100 -w S0:5GB:8:1:2
and see what you get. I think you should be able to get somewhere around ~200 GB/s.
1 year ago
To saturate the bandwidth, you would need ~16 zen4 cores but you could first try running
lkwid -t load -i 100 -w S0:5GB:8:1:2
and see what you get. I think you should be able to get somewhere around ~200 GB/s.
w/ likwid-bench S0:5GB:8:1:2, 129136.28 MB/s . At S0:5GB:16:1:2 184734.43 MB/s (this is the max, S0:5GB:12:1:2 is 186228.62 and S0:5GB:48:1:2 is 183598.29 MB/s) - According to lstopo my 9274F has 8 dies with 3 cores on each (currently each die is set to its own NUMA domain (L3 strat). In any case, I also gave `numactl --interleave=all likwid-bench -t load -w S0:5GB:48:1:2 -i 100` a spin and topped out about the same place: 184986.45 MB/s.
Yes, you're correct that your CPU has 8 CCDs but the bw with 8 threads is already too low. Those 8 cores should be able to get you at roughly half of the theoretical bw. 8x zen5 cores for comparison can reach the ~230 GB/s mark.
Can you repeat the same lkwid experiment but with 1, 2 and 4 threads? I'm wondering when is it that it begins to detoriate quickly.
Maybe also worth doing is repeating the 8 threads but forcing lkwid to pick every third physical core so that you get 1 thread per CCD experiment setting.
1: 33586.74 2: 47371.93 4: 65870.07
With `likwid-bench -i 100 -t load -w M0:5GB:1 -w M1:5GB:1 -w M2:5GB:1 -w M3:5GB:1 -w M4:5GB:1 -w M5:5GB:1 -w M6:5GB:1 -w M7:5GB:1` we get 187976.60
Obvious there's a bottleneck either going on somewhere - at 33.5GB/s per channel, that would get close to 400GB/s, what you'd expect, but the reality is that it doesn't get to half of that. Bad MC? Bottleneck w/ the MB? Hard to tell, not sure that without swapping hardware there's much more that can be done to diagnose things.
4 replies →