Comment by cpa
19 days ago
The big reason is that H3 is data independant. You put your data in predefined bins and then join on them, whereas kd/r trees depend on the data and building the trees may become prohibitive or very hard (especially in distributed systems).
Indices are meant to depend on the data yes, not exactly rocket science.
Updating an R-tree is log(n) just like any other index.
I think the key is in the distributed nature, h3 is effectively a grid so can easily be distributed over nodes. A recursive system is much harder to handle that way. R-trees are great if you are OK with indexing all data on one node, which I think for a global system is a no-go.
This is all speculation, but intuitively your criticism makes sense.
Also, mapping 147k cities to countries should not take 16 workers and 1TB of memory, I think the example in the article is not a realistic workload.
To add to sibling comment, if you have streaming data you have to update the whole index every time with r/kd trees whereas with H3 you just compute the bin, O(1) instead of O(log n).
Not rocket science but different tradeoffs, that’s what engineering is all about.
How do you join two datasets using r-trees? In a business setting, having a static and constant projection is critical. As long as you agree on zoom level, joining two datasets with S2 and H3 is really easy.
Spatial indices simply partition your data in N-dimensional space the same way a binary tree does it in 1-dimensional space.
The whole advantage over a static partition is that it will allow you to properly deal with data that is irregularly distributed.
Those data structures can definitely be merged if that's what you're asking.
1 reply →