← Back to context

Comment by wahern

8 years ago

If the use case is write-once, read-many, you can simply build a perfect hash table. Using techniques like CHD the cost is IME usually in the range of 1x (no cost) to 3x, depending on how compact you want the table, how large it is, and how much time you want to spend up front.

Examples from my own small library (http://25thandclement.com/~william/projects/phf.html), using 1 million random integers:

  80% load -> 0.47s
  85% load -> 0.56s
  90% load -> 0.74s
  95% load -> 1.10s
  98% load -> 1.35s
  99% load -> 1.68s

If I hash the items as C++ string objects rather than uint32 scalars then the time is doubled, presumably because of the dynamic allocation. Which goes to show the "perfect" in perfect hash table doesn't need to be the dominate cost.

That said, 90% of the time I just use a simple red-black tree--a so-called intrusive implementation. It makes memory management much easier in C, and rare is the situation where a hash table (let alone a perfect hash table) can make a real difference relative to focusing on all the other inefficiencies.