← Back to context

Comment by Agingcoder

3 hours ago

Yes there are scheduling issues, Numa problems , etc caused by the cluster in a box form factor.

We had a massive performance issue a few years ago that we fixed by mapping our processes to the numa zones topology . The default design of our software would otherwise effectively route all memory accesses to the same numa zone and performance went down the drain.

Intel contributes to Linux, how is this a problem?

  • Wrong level of abstraction. NUMA is an additional layer. If the program (script, whatever) was written with a monolithic CPU in mind then the big picture logic won't account for the new details. The kernel can't magically add information it doesn't have (although it does try its best).

    Given current trends I think we're eventually going to be forced to adopt new programming paradigms. At some point it will probably make sense to treat on-die HBM distinctly from local RAM and that's in addition to the increasing number of NUMA nodes.

  • Often the Linux scheduling improvements come a year or two after the chip. Also, Linux makes moment-by-moment scheduling and allocation decisions that are unaware of the big picture of workload requirements.