Comment by mwpmaybe

2 years ago

It's amazing how much "optimization" you can achieve on modern systems by simply fencing process(es) to run within a cache region, or within a NUMA node. Even my homelab server with a P-core cluster and two E-cores clusters benefits massively from some simple cpusets to keep each process running within a cluster to mitigate context switching. Each P-cores has its own L2 cache, but E-core clusters share L2. Unfortunately all three clusters share a LLC, so there's only so much you can do.