Comment by saagarjha

5 years ago

Yeah, linked lists are bad for the data cache since each element is in some totally random area of memory and thus less likely to be in a cache. Whereas for a linear array the data is next to each other and can be cached effectively and accesses can be easily predicted.

Some 20 years ago, when I was a kernel developer with high performance requirements, my favorite data structure was a linked list of smallish arrays. Each array was small enough to be cheap to completely rewrite on insert/delete, if you needed to e.g. preserve insertion order; most of the time, it was enough to preserve order only across the linked list, treat each array like a hash table slot, or sort at consumption time.

  • I’ve considered making a vector sorta thing implemented as a linked list of arrays. Assuming you sized the arrays to fit well inside the CPU cache, are there really many tradeoffs?

    • Even if the array overflows the cache, it's not too bad for most access patterns. CPUs will easily prefetch data on a linear scan, so the "second half of the array" will most definitely be in cache by the time you get there. And a linear scan-once shouldn't even evict too much other stuff out of the cache.

      In practice, I would let benchmarks decide the array size. Modern CPUs are surprisingly good at prefetching even linked lists, and x86 works weird out-of-order magic and speculation while waiting for memory. My rules of two thumbs were 1) where performance really matters, don't trust your intuition, program and benchmark alternative implementations 2) don't trust microbenchmarks, e.g. total icache pressure changes whether inlining is good or not.

    • That's how Clojure implements vectors except its a tree of 8-element arrays. This has nice functional programming properties because it makes it easy to make immutable copies of large vectors with good performance and memory characteristics.

    • It sounds like you are talking about an unrolled linked list[1]. I'd argue in most cases that while they are better than linked lists in just about every way (except data objects larger than the cache line), in most cases there is a better data structure to use. I'd find that the hashed array tree[2] would likely be a better/more versatile data structure solely due to the improved lookup time.

      [1]: https://en.wikipedia.org/wiki/Unrolled_linked_list

      [2]: https://en.wikipedia.org/wiki/Hashed_array_tree

You can also force-align (with padding if necessary) your structures to be along whatever boundaries make sense for your processor's cache lines.

Ensuring an oft-loaded structure always fits inside a single cache line (instead of splaying across ~1.9 cache lines on average) is good not only for fitting lots of those structures into your L1 but for not blowing out cache bandwidth (read _and_ write, if you're modifying them).

  • It's much more difficult to prefetch them, though - if you're traversing a large list it misses the cache every time, rather than just the first element.

    (someone please correct me if I'm wrong)

    • That's right. The reason why you see linked lists so much in old C stuff is (IMO) similar to the reason you see null terminated strings (and the associated bugs if you truncate incorrectly). And that's because of one of C's original sin: the failure to include size information in array objects.

      There is also the claim that linked lists are better in certain types of performance sensitive code because you can sometimes avoid allocating. I don't fully understand the logic there myself but I trust that there are cases where this is true.

      18 replies →

Expanding on this: usually linked lists allocate each element with a separate malloc() on the heap, so they end up all over the place in memory.

You could, in principle, allocate your linked list out of contiguous memory in such a way that sequentially reading the list doesn't cause so many page accesses. That would itself be a complicated data structure to maintain, so simply using a linear array is often the right tradeoff.

  • That's just a slab allocator. They're not what I'd call trivial, but still simple enough to be common in small projects and CS homework. I agree that an array is still generally simpler though.

    • Once you are implementing your own allocator you need to worry about fragmentation under many operations, which would be the majority of the complexity.