← Back to context

Comment by mgaunard

20 days ago

Why doesn't it use k-d trees or r-trees?

The big reason is that H3 is data independant. You put your data in predefined bins and then join on them, whereas kd/r trees depend on the data and building the trees may become prohibitive or very hard (especially in distributed systems).

  • Indices are meant to depend on the data yes, not exactly rocket science.

    Updating an R-tree is log(n) just like any other index.

    • I think the key is in the distributed nature, h3 is effectively a grid so can easily be distributed over nodes. A recursive system is much harder to handle that way. R-trees are great if you are OK with indexing all data on one node, which I think for a global system is a no-go.

      This is all speculation, but intuitively your criticism makes sense.

      Also, mapping 147k cities to countries should not take 16 workers and 1TB of memory, I think the example in the article is not a realistic workload.

    • To add to sibling comment, if you have streaming data you have to update the whole index every time with r/kd trees whereas with H3 you just compute the bin, O(1) instead of O(log n).

      Not rocket science but different tradeoffs, that’s what engineering is all about.

    • How do you join two datasets using r-trees? In a business setting, having a static and constant projection is critical. As long as you agree on zoom level, joining two datasets with S2 and H3 is really easy.

      2 replies →