Comment by adrianmonk

2 hours ago

Because quicksort is like a worst case scenario for branch prediction.

Branch prediction works when the CPU can detect some kind of pattern in the program behavior, like if the branch usually goes one way and rarely goes the other way. If it predicts correctly, the CPU pipeline keeps going without stalling. If it predicts incorrectly, the CPU wastes time doing work that it has to throw away. Therefore, being able to predict correctly is essential.

In quicksort, you partition an array into two halves based on whether each element is smaller or larger than the pivot value. The key to good quicksort performance is to choose a good pivot which divides the values in half about equally. This means (assuming you're sorting randomly-ordered data) that the branching behavior will be totally unpredictable, like flipping a coin. So branch prediction will be basically useless.

Copying small chunks of data is OK if it all fits in cache, especially if it all fits in L1 cache. It's not ideal to copy data unnecessarily, but if it allows you to keep the CPU pipeline running at full speed with no stalls, it can be a good trade-off.