Comment by wahern
8 years ago
If the use case is write-once, read-many, you can simply build a perfect hash table. Using techniques like CHD the cost is IME usually in the range of 1x (no cost) to 3x, depending on how compact you want the table, how large it is, and how much time you want to spend up front.
Examples from my own small library (http://25thandclement.com/~william/projects/phf.html), using 1 million random integers:
80% load -> 0.47s
85% load -> 0.56s
90% load -> 0.74s
95% load -> 1.10s
98% load -> 1.35s
99% load -> 1.68s
If I hash the items as C++ string objects rather than uint32 scalars then the time is doubled, presumably because of the dynamic allocation. Which goes to show the "perfect" in perfect hash table doesn't need to be the dominate cost.
That said, 90% of the time I just use a simple red-black tree--a so-called intrusive implementation. It makes memory management much easier in C, and rare is the situation where a hash table (let alone a perfect hash table) can make a real difference relative to focusing on all the other inefficiencies.
No comments yet
Contribute on Hacker News ↗