← Back to context

Comment by thaumasiotes

4 years ago

> assuming the map doesn't need resizing

This isn't a big difficulty; it's still amortized O(1).

> and there isn't a hash collision

This is a real difficulty, unless you allow map resizing. Luckily, we do.

> but it's generally not true in respect to the key length.

OK, but in most cases the key length is constant, making anything that depends on the key length O(1) by definition.

I wrote my own version of a part of a very popular Java scientific tool, and my version runs about 50 times faster. Their mistake? They had a hashCode() implementation on the objects they were using as keys for HashMaps that iterated through all of the voluminous content of that object. And there was no point - they could have used IdentityHashMaps instead with the same result. I pointed this out to them, and they still haven't fixed it.

I'm guessing GP means the complexity guarantee sidesteps the complexity of the hashing function. It probably doesn't matter all that much in typical case - I'm guessing 80-90% of hash map use is with very short strings.

  • Well-written complexity guarantees specify the operations they count. Otherwise sorting in O(n log(n)) also "sidesteps" the cost of comparison too.

    • And the analysis of hashmaps is not such a well-written guarantee -- as you resize, you need a bigger hash function output to reach all possible buckets. A bigger hash function output, assuming you have to keep the avalanche effect to keep output well-scrambled, requires more computations.

      Earlier discussion: https://news.ycombinator.com/item?id=9807739

  • Short strings, long strings; they're going to use the same key length. Calculating the key may take longer for the long string, if you're basing the hash on the contents of the string[1], but the key won't end up being a different size. The md5 of a 3-byte string is 16 bytes and the md5 of a 40GB string is also 16 bytes.

    [1] Not typical. e.g. Java takes the hash key of an object to be its address in memory, which doesn't require looking at the contents.

    • Calculating the key may take longer for the long string

      Right, that’s exactly what they are warning about.

      Not typical. e.g. Java takes the hash key of an object to be its address in memory

      No, that’s just the base implementation in Object (and arguably it was a bad idea). All useful “value type” classes will override it with a real hash of the content, including String.

      There are some cases in Java where you do want to use IDs instead of values as your map keys, but they’re rare.

      7 replies →

If the key length is constant, the map as an upper limit on the number of possible elements, so all operations are constant time.