← Back to context

Comment by gpderetta

8 hours ago

> [pointer stability] you're stupidly allowed to rely on them

that's a bit unfair. Those are properties that can be very useful if you need them.

Indeed. One of the main reasons STL defaults to a non-moving hash table is because "deleting in the middle of an iteration" was viewed as an important property. Like it or not (I do not), the entire container part of the original 1990s Stepanov STL was oriented around "iteration/iterators" as the high priority abstraction.

Trying to think like a DB engineer is probably helpful here. Point or range query workloads absolutely afford different optimization/specializations than full table scan iterations.

Any mistake here is more in thinking "one size fits all workloads". That may as much be a mistake of user communities as of language designers. (E.g. 'expert' advice like "just always use X" instead of "well, it depends...".) So, the charitable reading of tialaramex's "most people" is a weakly articulated assertion about workloads (though I absolutely agree with your specific point).

  • I'm not at all convinced I need to be read "charitably". I think it's just straightforwardly true that most people writing C++ with a std::unordered_map do not use this property of std::unordered_map.

    In particular the "deleting arbitrary other entries while walking the unordered map" is a weird thing to be doing. It's not necessarily wrong but it's weird to need precisely this (which it so happens the std::unordered_map can do) but not e.g. inserting arbitrary stuff into the map (which might grow and thus invalidate your iterator)

    Note that "Delete yourself if you want" is an operation we can afford in the otherwise more optimal containers, so if that is the only deletion you want then you again over-paid by choosing this weird container type with pointer stability.

    I do not believe that Stepanov chose this particular hash table because this one had properties he specifically wanted, maybe you have a reference which can prove me wrong? I think this was the type he was most familiar with, and the STL is illustrating generic programming - an important idea - not showing off the most optimal data structures and algorithms.

    • I tried to defend you against a hard to support "most people didn't want this" with a workload (either static or dynamic) assertion. You come back saying you don't need such charitability while simultaneously clarifying in terms of a static workload! Seemingly, "most written C++" (again charitably moving from "people writing" to "written code" because the latter is probably more relevant and surely more auditable/studyable - "source code files easier than people").

      I think you should be more careful about claiming you know what people want or know. Many people I know don't seem to know what they want.. it's some melange of desiderata with various eternal tensions. As just one example, I would doubt most people even want performance above all else. I really doubt most people understand well how their various static and dynamic workloads relate to their performance. I doubt I'm alone in having known many C++ programmers who, from spoken conversation, revealed they thought STL map was a hash table, for example.

      I only said "one of the main" and almost added "with pointer stability being the bigger", but it had already been observed by rurban. Delete-while-iterating has close ties with cell motion (though convoluted workarounds may exist, they may well cost more than you personally would like, it sounds). How to best explain all these things is always questionable, but I was trying to supplement. It's generally not easy without visual aids and definitely not if people are being combative. Python has ever & always since the late 80s used open addressing but only more recently tried to `raise` on dict edits during iterations.

      It seems to be flip-flopping that you are now calling separately chained hash tables a "weird container type" when 3 months ago it seemed to be news to you that they were NOT the main strategy for decades before open addressing on some Nim -ish comment on a Carbon thread: https://news.ycombinator.com/item?id=44754821

      Anything else I would say, such as Stepanov not even choosing a hash table in the first place, gpderetta said very well in a sibling. For the record, I still very much disagree with the early (& later) design of the STL and am no fan of WG21. Have a good weekend.

    • The primary reason you might want reference stability is because you might have another pointer to it. Reference stability is generally important for the STL and carefully documented; even for vectors stability is guaranteed on many modify operations. I have seen plenty of code that use reserve+push_back specifically to preserve stability for example.

      The primary reason for unordered_map to preserve stability is that std::map does and the ability of unordered_map to be a drop-in replacement for std::map for code that doesn't need ordering was considered at that time a better tradeoff than the absolute optimal implementation.

      You can of course build a pointer stable map on top of an unstable one by heap allocating every object and storing unique_ptrs to the objects in the map, but when hash_map was initially added to the STL (remember than unordered map is just standardizing the non-standard hash_map), there was no unique_ptr (we do not speak of auto_ptr).