Comment by jeffbee

3 hours ago

My question: why do mainstream users tolerate NUMA? 99% of you don't need to. Single-socket servers exist and they are not only tolerable but better in most ways. Dealing with NUMA in software consists of trying to logically partition the machine, but you can instead physically partition the machine. It's so much simpler!

Amazon gets this. Except for the 4th generation their Graviton systems are not NUMA.

2 comments

jeffbee

toast0 2 hours ago

NUMA latencies across machines are way worse than across sockets or across core complexes. :p

Single socket doesn't necessarily get you away from NUMA anyway, AMD server sockets are 4 way NUMA (you can set it for interleaving, but you could do better with NUMA-aware software), and I think Intel is doing NUMA on server socket as well.

A lot of people like to take one big machine and partition it into several smaller virtual machines. In that case, it shouldn't be too hard to partition vms into NUMA zones? Only vms that are two big to fit in one zone have to worry about it (or that need to be repacked into a different zone)

jeffbee 2 hours ago

AMD's NPS4 mode isn't exactly user-friendly, I agree. But you can put it into NPS1 mode and relax. Graviton 5, as a counterpoint, doesn't give you the option. Physically there is a 2D mesh between the cores and the memory controller but the observable behavior is that every access gets the average mesh fabric latency. The efficiency you leave on the table isn't very large, whereas in multisocket NUMA you can't ignore the cost.
I think you can over-analyze this stuff and lose your sanity. On these multicore systems there are also hot cores in the center of the mesh and cold ones at the edges and theoretically you could be doing temperature-aware scheduling, gaining a bit more efficiency in doing so. But it's just easier to adopt the black box model of spherical frictionless CPUs.