Comment by tialaramex
12 hours ago
This seems like it has causality upside down. The pointer stability and weird bucketing API for std::unordered_map happen to be consequence of its design, and you're stupidly allowed to rely on them, and so it can't be trivially replaced with a better hash table.
But most people didn't want this. std::unordered_map is just a more modern example of WG21's ability to clutch failure from the jaws of success than say std::vector<bool>
The approach taken in std::unordered_map was a historical footnote when C++ 98 shipped, but what's crazy is that it didn't ship in C++ 98 it's from C++ 11. The ISO document specifying this archaic hash table design as the standard C++ hash table is from the Obama Presidency like Rust 1.0.
So while maybe it's unfair in one sense, in another sense it makes absolute sense to compare against std::unordered_map because this is what you get out of the box.
> So while maybe it's unfair in one sense, in another sense it makes absolute sense to compare against std::unordered_map because this is what you get out of the box.
That's not really a reason to omit comparisons with other better implementations.
That's entirely fair, it would be nice to compare against other popular hash tables too.
> [pointer stability] you're stupidly allowed to rely on them
that's a bit unfair. Those are properties that can be very useful if you need them.
Indeed. One of the main reasons STL defaults to a non-moving hash table is because "deleting in the middle of an iteration" was viewed as an important property. Like it or not (I do not), the entire container part of the original 1990s Stepanov STL was oriented around "iteration/iterators" as the high priority abstraction.
Trying to think like a DB engineer is probably helpful here. Point or range query workloads absolutely afford different optimization/specializations than full table scan iterations.
Any mistake here is more in thinking "one size fits all workloads". That may as much be a mistake of user communities as of language designers. (E.g. 'expert' advice like "just always use X" instead of "well, it depends...".) So, the charitable reading of tialaramex's "most people" is a weakly articulated assertion about workloads (though I absolutely agree with your specific point).
I'm not at all convinced I need to be read "charitably". I think it's just straightforwardly true that most people writing C++ with a std::unordered_map do not use this property of std::unordered_map.
In particular the "deleting arbitrary other entries while walking the unordered map" is a weird thing to be doing. It's not necessarily wrong but it's weird to need precisely this (which it so happens the std::unordered_map can do) but not e.g. inserting arbitrary stuff into the map (which might grow and thus invalidate your iterator)
Note that "Delete yourself if you want" is an operation we can afford in the otherwise more optimal containers, so if that is the only deletion you want then you again over-paid by choosing this weird container type with pointer stability.
I do not believe that Stepanov chose this particular hash table because this one had properties he specifically wanted, maybe you have a reference which can prove me wrong? I think this was the type he was most familiar with, and the STL is illustrating generic programming - an important idea - not showing off the most optimal data structures and algorithms.
2 replies →