Comment by Spivak
6 months ago
I mean this gets to the fundamental question of what being ordered actually means. The author decides that regardless of whether the data structure has an order, even if that order is guaranteed, it's a necessary property that order must be taken into account to determine equality. A very sensible and pure definition that says that equivalence classes should be single elements.
But this is a weird thing to actually do in Python, it's not a common thing to test dictionaries for equality like this. If you parsed some file into a dictionary and you want to preserve the order it was originally in when you write it back out then the standard dict works just fine. There's a tension in the article that if you rely on ordering you ought to consider using OrderedDict instead but that leads you to never rely on dict() ordering which is fine— but it's guaranteed to you. You're supposed to use it! The standard dict() even has a __reversed__ method so ordering is a meaningful property.
I think it was a mistake to standardize that dict will maintain insertion order. It's a nice side effect of the internal implementation, but I fear it will hinder future improvements.
The problem is that once this side effect is in the main version for a while, it becomes a feature whether you admit it or not. Since that is going to happen anyway, you might as well make it official.
The alternative approach is to do what Go did and explicitly randomize iteration order to prevent people from relying on a fixed order in the first place.
> explicitly randomize iteration
In fact, we see this with CPython `set`, controlled by `_Py_HashSecret`:
https://github.com/python/cpython/blob/6eb6c5dbfb528bd07d77b...
https://github.com/python/cpython/blob/6eb6c5dbfb528bd07d77b...
https://github.com/python/cpython/blob/6eb6c5dbfb528bd07d77b...
https://github.com/python/cpython/blob/6eb6c5dbfb528bd07d77b...
This is a genuine concern, since it hinders our ability to port over high-quality, high-performance hash table implementations from other languages (since these often do not preserve any human ordering.)
However, the ship has already sailed here. I think that once insertion-ordering became the standard, this creates a guarantee that we can't easily back down from.
The python dict() is already high-quality and high-performance. You're probably not going to be able to do much better than the current state. If you want faster you end up moving your data into an environment that lets you make more assumptions and be more restrictive with your data and operate on it there. It's how numpy, polaris, and pandas work.
Everything in Python is a dict, there's no data structure that's been given more attention.
https://m.youtube.com/watch?v=p33CVV29OG8
3 replies →
I don't believe it was because some side effect became documented.
It was because they wanted the order of the names in a class definition to match the order the were declared. The same dict implementation is used everywhere, so the standard dict acquired the behaviour.