Comment by treesknees
7 hours ago
Something I didn’t see mentioned was that this unequal memory access time also affects pcie I/O. If your thread on CPU A needs to get data in or out of a nic on CPU B, your throughput/latency will be impacted.
We have to explain this to customers of our software all the time, it’s something that’s easy to miss.
Was going to mention this too, as it burned me once. Not cause I didn't know about it but because I was accidentally running stuff on the wrong node, and it wasn't obvious which slot was which node.
Same. The drop in performance can be surprisingly bad. 10Gbps becomes 5Gbps. 100Gbps becomes 20Gbps.
When building Edera (product from article), I also had the added problem of the virtual networking gap where I was bridging a 10Gbit NIC over a virtual interface, and I had weird performance bouncing between 3Gbit and the full 10Gbit. Luckily I had built networking drivers before and knew the complexities of it, and managed to profile it down to the virtual interface getting worst-case NUMA occasionally.
The part 2 is going to cover how we actually solved it, which involves every part of the system having knowledge. It's so easy to ignore but it has a massive impact on perf.
(CTO of Edera here)
Great point! We also try to factor that in as well.
Steven (the author) will cover that in part 2!