Don't compare apples to oranges. unordered_map is so slow because it has to guarantee pointer stability, doing seperate chaining, whilst the open addressing hashtables doing probing and moving do not. They are at least 2x faster.
Compare to linear probing, quadratic probing, double hashing, cuckoo, or swiss tables.
This seems like it has causality upside down. The pointer stability and weird bucketing API for std::unordered_map happen to be consequence of its design, and you're stupidly allowed to rely on them, and so it can't be trivially replaced with a better hash table.
But most people didn't want this. std::unordered_map is just a more modern example of WG21's ability to clutch failure from the jaws of success than say std::vector<bool>
The approach taken in std::unordered_map was a historical footnote when C++ 98 shipped, but what's crazy is that it didn't ship in C++ 98 it's from C++ 11. The ISO document specifying this archaic hash table design as the standard C++ hash table is from the Obama Presidency like Rust 1.0.
So while maybe it's unfair in one sense, in another sense it makes absolute sense to compare against std::unordered_map because this is what you get out of the box.
> So while maybe it's unfair in one sense, in another sense it makes absolute sense to compare against std::unordered_map because this is what you get out of the box.
That's not really a reason to omit comparisons with other better implementations.
Aside already mentioned comparison to unordered_map, there appears to have a bug, on line 61: "p = (p + 1) & LenV". It should be mod (%) like the rest of the code.
Morealso mod is slow in general and it should be replaced by bitwise and (&) and power of 2 sized map, then using LenV-1.
I understood what's going on in this article only after reading the paper. It's might be good to define things s bit better in the article or say that the paper is a prerequisite for reading the article
The paper itself is an overview that doesn't discuss the cache-friendly optimizations in the original 1986 PhD thesis so it feels like a hash table all the way down.
Don't compare apples to oranges. unordered_map is so slow because it has to guarantee pointer stability, doing seperate chaining, whilst the open addressing hashtables doing probing and moving do not. They are at least 2x faster.
Compare to linear probing, quadratic probing, double hashing, cuckoo, or swiss tables.
This seems like it has causality upside down. The pointer stability and weird bucketing API for std::unordered_map happen to be consequence of its design, and you're stupidly allowed to rely on them, and so it can't be trivially replaced with a better hash table.
But most people didn't want this. std::unordered_map is just a more modern example of WG21's ability to clutch failure from the jaws of success than say std::vector<bool>
The approach taken in std::unordered_map was a historical footnote when C++ 98 shipped, but what's crazy is that it didn't ship in C++ 98 it's from C++ 11. The ISO document specifying this archaic hash table design as the standard C++ hash table is from the Obama Presidency like Rust 1.0.
So while maybe it's unfair in one sense, in another sense it makes absolute sense to compare against std::unordered_map because this is what you get out of the box.
> So while maybe it's unfair in one sense, in another sense it makes absolute sense to compare against std::unordered_map because this is what you get out of the box.
That's not really a reason to omit comparisons with other better implementations.
1 reply →
> [pointer stability] you're stupidly allowed to rely on them
that's a bit unfair. Those are properties that can be very useful if you need them.
3 replies →
Aside already mentioned comparison to unordered_map, there appears to have a bug, on line 61: "p = (p + 1) & LenV". It should be mod (%) like the rest of the code.
Morealso mod is slow in general and it should be replaced by bitwise and (&) and power of 2 sized map, then using LenV-1.
I understood what's going on in this article only after reading the paper. It's might be good to define things s bit better in the article or say that the paper is a prerequisite for reading the article
The paper itself is an overview that doesn't discuss the cache-friendly optimizations in the original 1986 PhD thesis so it feels like a hash table all the way down.
Shoutout to unordered_dense, a drop-in replacement for C++ unordered_map that uses this:
https://github.com/martinus/unordered_dense