Comment by skavi

16 hours ago

modern tcmalloc uses per CPU caches via rseq [0]. We use async rust with multithreaded tokio executors (sometimes multiple in the same application). so relatively high thread counts.

[0]: https://github.com/google/tcmalloc/blob/master/docs/design.m...

3 comments

skavi

usrnm 15 hours ago

How do you control which CPU your task resumes on? If you don't then it's still the same problem described above, no?

skavi 11 hours ago
on the OS scheduler side, i'd imagine there's some stickiness that keeps tasks from jumping wildly between cores. like i'd expect migration to be modelled as a non zero cost. complete speculation though.
tokio scheduler side, the executor is thread per core and work stealing of in progress tasks shouldn't be happening too much.
for all thread pool threads or threads unaffiliated with the executor, see earlier speculation on OS scheduler behavior.
- packetlost 10 hours ago
  
  Correct. The Linux scheduler has been NUMA aware + sticky for awhile (which is more or less what this reduces to in common scenarios).