Comment by orlp

21 hours ago

Since pdqsort (an older project of mine) was mentioned, I felt it wouldn't be entirely inappropriate to mention that I've since then collaborated with Lukas Bergdoll to provide two high-quality sort implementations for the Rust standard library, ipnsort (unstable) and driftsort (stable).

So if you use Rust, you get these by simply calling [T]::sort(_unstable). Great performance out of the box :)

On my machine (Apple M2), using the benchmarks from the repository on Apple clang 17 and Rust 1.98 nightly:

    Sorting 50 million doubles:
    ipnsort             0.79s
    blqs                0.90s
    driftsort           1.13s   (stable)
    std::sort           1.22s
    std::stable_sort    4.64s   (stable)

    Sorting 50 million (i32, i32) structs:
    ipnsort             0.82s
    blqs                0.89s
    driftsort           1.07s   (stable)
    std::sort           3.09s
    std::stable_sort    3.15s   (stable)

And now for a cool party trick, let's repeat the 50 million doubles experiment again, but have the first 90% already sorted, last 10% random:

    driftsort           0.29s   (stable)
    ipnsort             0.81s
    std::sort           1.15s
    std::stable_sort    1.63s   (stable)
    blqs                1.89s

This is very impressive work.

I looked at your paper[0] and was curious why it was named "drift" sort. Even searching for 'drift' didn't show me. I mainly ask because this is noted as a stable sort and the word 'drift' implies movement; I did not expect it, from the name, to be a stable sort.

  • It's called driftsort because it's derived from another sort I made, glidesort: https://github.com/orlp/glidesort. Glidesort is a bit faster still for large inputs, however it was too large and complex for inclusion in the standard library, and suffered from code size penalties on small inputs. So driftsort is a slimmed down version more appropriate for general purpose.

Thanks for everything Orson! I know Clang struggled to ship improved sorts for their C++ implementation, so it's a good sign that Rust was able to ship ipnsort and driftsort without too much chaos.

Also, Lukas looked over my `misfortunate` crate (which provides "perverse" implementations of safe Rust traits) and although misfortunate isn't intended for testing he has inspired me to improve the perverse implementations of Ord, not for testing per se but to further illustrate. It occurs to me I should point anybody reading misfortunate's documentation at your/ Lukas' work in case they actually need really nasty tests not just mild perversion.

  • Fun, but there is a mistake in https://docs.rs/misfortunate/latest/misfortunate/struct.Maxw... which claims: "yet violates the social contract of these two combined. A Maxwell is not equal to anything (even itself) but all Maxwells hash the same"

    but that's not a violation of the contract, it's just a hash collision. Hash collisions are expected to happen, so HashMap and HashSet won't "misbehave seriously", they'll just be slow (linear-time lookup) and unable to remove entries.

    The contract of Hash is that if two values are equal, then they hash to the same value. Which would be violated by doing the opposite of Maxwell: all values equal each other, but their hash differs (you can even make it random so it changes across calls on the same value!)

    • FWIW, I believe that's not a mistake, just a consistent use of (novel?) jargon. From context I infer that the paragraph is distinguishing between the _hard requirements_ of the traits (quote: "the language contract says they cannot become unsafe") and the behavior human programmers would naturally _expect_ of the traits ("the social contract").

      And yes, looking at https://docs.rs/misfortunate/latest/misfortunate/index.html and Ctrl+F'ing for "social contract," I see that usage very consistently. The entire point of Misfortunate is that its types correctly implement the language contract, while violating the "social contract" in various surprising ways. For example, causing a hash collision on every operation is a perfectly cromulent implementation of Hash, but violates the "social contract" of Hash. For another example, look at LoadLetter — throwing an error on every operation is a valid implementation of Read, but still violates the "social contract" of what a human programmer would naturally expect from something readable.

    • I take your point but I do still think it's a violation here, if I see a type which implements Hash and Eq this doesn't feel like a reasonable outcome to me.

      Do you have a better way to describe how Maxwell<T> works here? Or, failing that, an idea for a type which you feel can be perverse in a more interesting way here?

      By the way (though you should not design things where this happens) you can alway remove things from a HashMap (or HashSet) because such collections provide e.g. retain and extract_if which both take callables, your callable is given a reference to each thing in the collection and asked if you want it, they behave differently and have different intended purposes but either will let you fish out a NaN you mistakenly stored in an ordered collection for example.

  • > I know Clang struggled to ship improved sorts for their C++ implementation

    Clang has no built-in sorting algorithms. I imagine you're referring to the LLVM project's libc++? Though all common distributions of LLVM default to GCC's libstdc++.