← Back to context

Comment by pfundstein

5 years ago

Never head of that, though I don't use C much. Are you referring to the CPU cache?

Yeah, linked lists are bad for the data cache since each element is in some totally random area of memory and thus less likely to be in a cache. Whereas for a linear array the data is next to each other and can be cached effectively and accesses can be easily predicted.

  • Some 20 years ago, when I was a kernel developer with high performance requirements, my favorite data structure was a linked list of smallish arrays. Each array was small enough to be cheap to completely rewrite on insert/delete, if you needed to e.g. preserve insertion order; most of the time, it was enough to preserve order only across the linked list, treat each array like a hash table slot, or sort at consumption time.

    • I’ve considered making a vector sorta thing implemented as a linked list of arrays. Assuming you sized the arrays to fit well inside the CPU cache, are there really many tradeoffs?

      3 replies →

  • You can also force-align (with padding if necessary) your structures to be along whatever boundaries make sense for your processor's cache lines.

    Ensuring an oft-loaded structure always fits inside a single cache line (instead of splaying across ~1.9 cache lines on average) is good not only for fitting lots of those structures into your L1 but for not blowing out cache bandwidth (read _and_ write, if you're modifying them).

    • It's much more difficult to prefetch them, though - if you're traversing a large list it misses the cache every time, rather than just the first element.

      (someone please correct me if I'm wrong)

      19 replies →

  • Expanding on this: usually linked lists allocate each element with a separate malloc() on the heap, so they end up all over the place in memory.

    You could, in principle, allocate your linked list out of contiguous memory in such a way that sequentially reading the list doesn't cause so many page accesses. That would itself be a complicated data structure to maintain, so simply using a linear array is often the right tradeoff.

    • That's just a slab allocator. They're not what I'd call trivial, but still simple enough to be common in small projects and CS homework. I agree that an array is still generally simpler though.

      1 reply →