← Back to context

Comment by mgaunard

19 days ago

Indices are meant to depend on the data yes, not exactly rocket science.

Updating an R-tree is log(n) just like any other index.

I think the key is in the distributed nature, h3 is effectively a grid so can easily be distributed over nodes. A recursive system is much harder to handle that way. R-trees are great if you are OK with indexing all data on one node, which I think for a global system is a no-go.

This is all speculation, but intuitively your criticism makes sense.

Also, mapping 147k cities to countries should not take 16 workers and 1TB of memory, I think the example in the article is not a realistic workload.

To add to sibling comment, if you have streaming data you have to update the whole index every time with r/kd trees whereas with H3 you just compute the bin, O(1) instead of O(log n).

Not rocket science but different tradeoffs, that’s what engineering is all about.

How do you join two datasets using r-trees? In a business setting, having a static and constant projection is critical. As long as you agree on zoom level, joining two datasets with S2 and H3 is really easy.

  • Spatial indices simply partition your data in N-dimensional space the same way a binary tree does it in 1-dimensional space.

    The whole advantage over a static partition is that it will allow you to properly deal with data that is irregularly distributed.

    Those data structures can definitely be merged if that's what you're asking.

    • This data is indeed not irregularly distributed, in fact the fun thing about geospatial data is that you always know the maximum extent of it.

      About your binary tree comment: yes this is absolutely valid, but consider then that binary trees also are a bad fit for distributed computing, where data is often partitioned at the top level (making it no longer a binary tree but a set of binary trees) and cross-node joins are expensive.