Comment by reificator

5 years ago

> If your primary operation is inserting at a random location in the list, linked lists are faster than arrays at large sizes. You avoid having to move all the memory after the index you are modifying (to make space for the inserted element).

This is false. Big O notation says it should be true, you'll get marked wrong if you say arrays are faster in your algorithms & data structures final, but when you're running on actual hardware the array is faster at all sizes of n and as n becomes larger so does the gap in performance.

Here is a talk[0] by Bjarne Stroustrop (Creator of C++) that even includes imaginary graphs demonstrating this phenomenon. If you want a visual for what the missing graph was supposed to look like, here's a similar one.[1]

Here's another video[2] by Scott Meyers (Author of Effective C++) that goes into more detail about why this happens.

[0]: https://www.youtube.com/watch?v=YQs6IC-vgmo [1]: https://airspeedvelocity.files.wordpress.com/2015/08/pasted_... [2]: https://www.youtube.com/watch?v=WDIkqP4JbkE

Summary of the Stroustrop video - inserting or removing an item at a point in an array of 100k items needs a linear scan from the start of the data structure to get to the point - an array does that very quickly, a linked list is much slower, lots of random memory indirection, a pointer for every list node blowing out the cache - in practice this linear scan to to the change point dominates the runtime, and arrays come out much faster.

While the array change does need on average 50k items to be shuffled up to make room or close the gap, modern caches are very good at that.

If the array is sorted it can be binary searched to get to the change point, which improves its performance even more, linked lists can’t do that.

Interesting.

  • I can definitely see in modern cpu array scanning being faster than pointer chasing but I wouldn't have expected that to survive insertion with 50K move wow!

    And if you not doing ordered insertion you wouldn't have to move the data in the array anyway, you would keep track of the size and jump to the end, so not sure I understand the binary search comment.

    The next question is at what level of growth the waste of empty space in the array becomes too much. Some kind of data structure (tree/linked list) with largish (whatever size applicable for modern cpu) as probably mentioned in other comments does seem the most versatile approach while keeping the performance. Or perhaps the handling of that data structure might overwhelm the array advantage?

    • > > If the array is sorted it can be binary searched to get to the change point, which improves its performance even more, linked lists can’t do that.

      > And if you not doing ordered insertion you wouldn't have to move the data in the array anyway, you would keep track of the size and jump to the end, so not sure I understand the binary search comment.

      It just means that if I want to view the nth element of an array, that's a constant time operation. I just take the pointer and add n times the size of the elements.

      But for a linked list if I want to view the nth element of the list, I have to view the (n-1)th element first, all the way back until the first element I have a reference to.

      > The next question is at what level of growth the waste of empty space in the array becomes too much.

      Everything I've tried and everything I've seen from people testing on real hardware is that the gap in performance widens with larger values of n.

      You might expect different performance as the system runs out of memory. But arrays have a size of n * element size, and linked lists have a size of n * (pointer(s) + element size), so the linked list would hit memory limitations more quickly regardless.

      2 replies →

I think it's also worth noting that this is specific to (common) CPU architecture where cache is sufficiently big and orders of magnitude faster than main memory, and where main memory reads and writes have similar cost.

The tradeoff in moving memory at the end of the array vs scanning to the point of linked list insertion could be different on some existing and some hypothetical systems where these concepts may be relevant. For example, storage write cycle minimization could be a priority in rare cases.

That said this is an excellent point about the majority of use cases. I hadn't considered the dominance of the time it takes to locate the element to modify. I've never had a reason to benchmark a linked list since even a basic idea of the effect of cache/prefetch precludes its use in every use case I've actually come across.

Probably useful for CS education to start comparing linked lists to bubble sort. There's always a better algorithm, but here's a simple one to learn so you understand how to compare data structure performance.

Doesn’t this depend on the size of each element? What if each element is 1 MiB?

  • Are you reading the whole 1 MiB in linear order? If so, then the prefetcher should help in exactly the same way.

    If you're not, then why is all of that data packed together? Is there an alternate layout you could use instead where you iterate over the exact values you need? If performance is important to you in that context it might be worth it.

    • Right - I suppose any large associated data that does not need to be searched can be malloc()'d (via standard or fancy allocator) and you just store a pointer. Or if the other data is relatively small, just in another array.

      One case where I don't know what to think: the implications of memmoving the subarray on NUMA cache invalidation in an extremely multithreaded application.