Comment by alexchamberlain
4 years ago
It makes me chuckle when hash maps are stated to be O(1) insertions. Which is true, in respect to the number of items in the map, assuming the map doesn't need resizing and there isn't a hash collision... but it's generally not true in respect to the key length. (I think most implementations are O(ln), where l is the length of the key and n is the number of inserted items, assuming the hash function is O(l) - the _amortised_ runtime would be O(l))
> assuming the map doesn't need resizing
This isn't a big difficulty; it's still amortized O(1).
> and there isn't a hash collision
This is a real difficulty, unless you allow map resizing. Luckily, we do.
> but it's generally not true in respect to the key length.
OK, but in most cases the key length is constant, making anything that depends on the key length O(1) by definition.
I wrote my own version of a part of a very popular Java scientific tool, and my version runs about 50 times faster. Their mistake? They had a hashCode() implementation on the objects they were using as keys for HashMaps that iterated through all of the voluminous content of that object. And there was no point - they could have used IdentityHashMaps instead with the same result. I pointed this out to them, and they still haven't fixed it.
I'm guessing GP means the complexity guarantee sidesteps the complexity of the hashing function. It probably doesn't matter all that much in typical case - I'm guessing 80-90% of hash map use is with very short strings.
Well-written complexity guarantees specify the operations they count. Otherwise sorting in O(n log(n)) also "sidesteps" the cost of comparison too.
1 reply →
Short strings, long strings; they're going to use the same key length. Calculating the key may take longer for the long string, if you're basing the hash on the contents of the string[1], but the key won't end up being a different size. The md5 of a 3-byte string is 16 bytes and the md5 of a 40GB string is also 16 bytes.
[1] Not typical. e.g. Java takes the hash key of an object to be its address in memory, which doesn't require looking at the contents.
8 replies →
If the key length is constant, the map as an upper limit on the number of possible elements, so all operations are constant time.
Key length must necessarily be O(log(N)) to be able to identify N different keys.
This is O(1) where N is constant.
9 replies →
But, isn't the key length a constant and we are back to O(1)? Ok, in theory you could exhaust all possible keys of a certain length and proceed with longer keys. It would give us what? O(ln(n))?
His point is, if you use Moby Dick as the key, it's going to take longer to hash that than a three letter string. Hashing isn't O(1) if the key has variable size.