Comment by mgaunard

20 days ago

Why doesn't it use k-d trees or r-trees?

7 comments

mgaunard

The big reason is that H3 is data independant. You put your data in predefined bins and then join on them, whereas kd/r trees depend on the data and building the trees may become prohibitive or very hard (especially in distributed systems).

mgaunard 20 days ago
Indices are meant to depend on the data yes, not exactly rocket science.
Updating an R-tree is log(n) just like any other index.
- vouwfietsman 20 days ago
  
  I think the key is in the distributed nature, h3 is effectively a grid so can easily be distributed over nodes. A recursive system is much harder to handle that way. R-trees are great if you are OK with indexing all data on one node, which I think for a global system is a no-go.
  This is all speculation, but intuitively your criticism makes sense.
  Also, mapping 147k cities to countries should not take 16 workers and 1TB of memory, I think the example in the article is not a realistic workload.
- cpa 20 days ago
  
  To add to sibling comment, if you have streaming data you have to update the whole index every time with r/kd trees whereas with H3 you just compute the bin, O(1) instead of O(log n).
  Not rocket science but different tradeoffs, that’s what engineering is all about.
- rockinghigh 20 days ago
  
  How do you join two datasets using r-trees? In a business setting, having a static and constant projection is critical. As long as you agree on zoom level, joining two datasets with S2 and H3 is really easy.
  
  2 replies →