Comment by ztorkelson

8 years ago

Gotcha. Yes, while there are implementation techniques that can help (e.g. arena allocation, prefix compression, adaptive node sizes), the data structure is fundamentally a tree, and so one can expect a higher number of random memory accesses (when compared with suitably optimized hash tables, like those discussed in the article).

In some scenarios, radix trees can overcome that with better space efficiency (i.e. implicit key storage), but I think it's fair to say that there are many scenarios (if not most!) where hash tables outperform radix trees.

Of course, that all hinges on how you measure performance. As Salvatore pointed out, if you're looking to minimize variance, hash tables might not be suitable due to their growth characteristics. Radix trees also give you efficient sorting and range scans out of the box, and are a viable immutable persistent data structure as well. If none of those are meaningful to your use cases, then hash tables could certainly be a better fit.

Arena-allocation? As in (roughly speaking) a Stack<Node> data structure whose lifetime is bound to the lifetime of the tree, right?

How could we support deletion of nodes?

  • This is getting deep into some wizardry, but if you used a custom pool-like allocator you could group contents close together in cache, and, if necessary/appropriate, even rearrange the elements in the container to be more cache-friendly for your purposes.

    • You could, but the premise of using a tree was to avoid unpredictable rehashing latency, if you start compacting the tree every now and then, you basically pay the same price.

  • Right. When you ask "how could we support deletion of nodes", am I right to assume you mean "how could we reclaim memory of deleted nodes"?

    Conventionally, a free list would be used, though admittedly that sacrifices locality which was one of the original motivating factors. To the extent that deletion is a significant component of the workload, then yeah, this doesn't help.

    • Yes, exactly. It would of course be trivial to unlink a node without reclaiming it, but then you're leaking.

      Radix trees never rotate, right? Can that be leveraged to help with locality?