Comment by wrsh07
8 years ago
It's a bit funny to write off a hash table that is by design competing with another hash table by saying "it's not a different data structure, so I'm not interested."
Moreover, this kind of optimization only makes sense if (1) latency, not throughput, is your bottleneck or (2) if you're really concerned about parallel accesses.
A reasonable default in general is to use a hash table + mutex until that is your bottleneck. (I've yet to encounter a scenario where switching from a hash map would provide significant gains in performance-relevant code)
Disclosure: I work at Google and use the fancy hash map discussed by Matt. I didn't work on it, though.
That's not a very charitable interpretation of the GP. He's not criticizing hash tables because they aren't radix trees. He's criticizing this and other hash table implementations because they have undesirable performance characteristics with respect to his needs, and noting an alternative data structure which he has found to be more suitable. I think that's fair game.
Note that there exist hash table implementations which support incremental resizing.
I do agree, though, that it's a leap to go from "hash tables with this property are inappropriate under these constraints" to "hash tables with this property should not be considered further", precisely because there is such a broad spectrum of data structure requirements across the industry.
> Note that there exist hash table implementations which support incremental resizing.
Rather, there exist hash table algorithms that support incremental resizing.
http://www.cs.cmu.edu/afs/cs.cmu.edu/user/christos/www/cours...
Additionally, a hash table could be implemented in such a way that it could be updated during resizing. As long as you set up locks in the right places, you could spend the extra memory to do an out-of-place rehash and allow both concurrent writers and your rehashing thread to write to the new table. It's doable, if difficult and error-prone.
Radix trees are very nice data structure. There is one drawback I never see in comparisons: they may require non-constant amount of space for an insertion operation, i.e., they may introduce more nodes than the one that is inserted, because nodes may need to be split. This means you cannot burden the space allocation on the caller of 'insert' (and the deallocation on the caller of 'remove'), e.g. by (statically) preallocating a node cell in each item that can potentially be part of a radix tree.
This is important in environments where you have or do not want to use malloc/free. E.g. in embedded systems, exactly where some properties like worst case time behaviour is important and, therefore, radix trees seem like an option: they still might not be.
Well, but hash tables are not an option either in these situations, so good old binary search trees win (red-black, avl, 2-4, whatever).