Comment by skavi

16 hours ago

modern tcmalloc uses per CPU caches via rseq [0]. We use async rust with multithreaded tokio executors (sometimes multiple in the same application). so relatively high thread counts.

[0]: https://github.com/google/tcmalloc/blob/master/docs/design.m...

How do you control which CPU your task resumes on? If you don't then it's still the same problem described above, no?

  • on the OS scheduler side, i'd imagine there's some stickiness that keeps tasks from jumping wildly between cores. like i'd expect migration to be modelled as a non zero cost. complete speculation though.

    tokio scheduler side, the executor is thread per core and work stealing of in progress tasks shouldn't be happening too much.

    for all thread pool threads or threads unaffiliated with the executor, see earlier speculation on OS scheduler behavior.

    • Correct. The Linux scheduler has been NUMA aware + sticky for awhile (which is more or less what this reduces to in common scenarios).