Comment by bob1029

5 hours ago

And avoid moving said data between physical threads as much as possible.

Most of the bottlenecks I see are not due to the organization of data. Unnecessary communication of data is the #1 offender.

Working set and algorithm diagonalization (work independence) FTW. Immutable data structures and copying often helps to avoid cache invalidation penalties.