← Back to context

Comment by adwn

10 hours ago

Cache contention is (mostly) orthogonal to your locking strategy. If anything, fine-grained locking has the potential to improve cache contention, because

1) the mutex byte/word is more likely to be in the same cache line as the data you want to access anyway, and

2) different threads are more likely to write to mutex bytes/words in different cache lines, whereas in coarse-grained locking, different threads will fight for exclusive access over the cache line containing that one, global mutex.

@magicalhippo: Since I'm comment-rate-throttled, here's my answer to your question:

Typically, you'd artificially increase the size and alignment of the structure:

    #[repr(align(64))]
    struct Status {
        counter: Mutex<u32>,
    }

This struct now has an alignment of 64, and is also 64 bytes in size (instead of just the 4+1 required for Mutex<u32>), which guarantees that it's alone in the cache line. This is wasteful from a memory perspective, but can be worth it from a performance perspective. As often when it comes to optimization, it very heavily depends on the specific case whether this makes your program faster or slower.

> different threads are more likely to write to mutex bytes/words in different cache lines

If you got small objects and sequential allocation, that's not a given in my experience.

Like in my example, the ints could be allocated one per thread to indicate some per thread status, and the main UI thread wants to read them every now and then hence they're protected by a mutex.

If they're allocated sequentially, the mutexes end up sharing cache lines and hence lead to effective contention, even though there's almost no "actual" contention.

Yes yes, for a single int you might want to use an atomic variable but this is just for demonstration purposes. I've seen this play out in real code several times, where instead of ints it was a couple of pointers say.

I don't know Rust though, so just curious.

  • The issue might be allocating the int contiguously in the first place. No language magic is going to help you avoid thinking about mechanical sympathy.

    And allocating the int contiguously might actually be the right solution is the cost of sporadic false sharing is less than the cost of wasting memory.

    There's no silver bullet.

    • But the mutex encapsulates the int, so if the mutex ensured it occupied a multiple of cache lines, there would be no contention. At the very small cost of a few bytes of memory.

      1 reply →