Comment by jandrewrogers

21 days ago

There is a lot of literature on join operations using discrete global grid systems (DGGS). H3 is a widely used DGGS optimized for visualization.

If joins are a critical performance-sensitive operation, the most important property of a DGGS is congruency. H3 is not congruent it was optimized for visualization, where congruency doesn’t matter, rather than analytical computation. For example, the article talks about deduplication, which is not even necessary with a congruent DGGS. You can do joins with H3 but it is not recommended as a general rule unless the data is small such that you can afford to brute-force it to some extent.

H3 is great for doing point geometry aggregates. It shines at that. Not so much geospatial joins though. DGGS optimized for analytic computation (and joins by implication) exist, they just aren’t optimal for trivial visualization.

16 comments

jandrewrogers

0xfaded 21 days ago

S2 has this property https://s2geometry.io

jandrewrogers 21 days ago
Yes, S2 is a congruent DGGS. Unfortunately, it kind of straddles the analytics and visualization property space, being not particularly good at either. It made some design choices, like being projective, that limit its generality as an analytic DGGS. In fairness, its objectives were considerably more limited when it was created. The potential use cases have changed since then.
- mdasen 21 days ago
  
  Is there a congruent DGGS that you would recommend?
  
  3 replies →

ajfriend 21 days ago

I agree that the lack of congruency in H3 hexagons can cause weird overlaps and gaps if you plot mixed resolutions naively, but there are some workarounds that work pretty well in practice. For example, if you have mixed resolutions from compacted H3 cells but a single “logical” target resolution underneath, you can plot the coarser cells not with their native geometry, but using the outline of their children. When you do that, there are no gaps. (Totally unrelated but fun: that shape is a fractal sometimes called a "flowsnake" or a "Gosper Island" (https://en.wikipedia.org/wiki/Gosper_curve), which predates H3 by decades.)

That said, this feels like an issue with rendering geometry rather than with the index itself. I’m curious to hear more about why you think the lack of congruency affects H3’s performance for spatial joins. Under the hood, it’s still a parent–child hierarchy very similar to S2’s — H3 children are topological rather than geometric children (even though they still mostly overlap).

jandrewrogers 20 days ago
In a highly optimized system, spatial joins are often bandwidth bound.
Congruency allows for much more efficient join schedules and maximizes selectivity. This minimizes data motion, which is particularly important as data becomes large. Congruent shards also tend to be more computationally efficient generally, which does add up.
The other important aspect not raised here, is that congruent DGGS have much more scalable performance when using them to build online indexes during ingestion. This follows from them being much more concurrency friendly.
- ajfriend 20 days ago
  
  I appreciate the reply! So, I might be wrong here, but I think we may be talking about two different layers. I’m also not very familiar with the literature, so I’d be interested if you could point me to relevant work or explain where my understanding is off.
  To me, the big selling point of H3 is that once you’re "in the H3 system", many operations don’t need to worry about geometry at all. Everything is discrete. H3 cells are nodes in a tree with prefixes that can be exploited, and geometry or congruency never really enter the picture at this layer.
  Where geometry and congruency do come in is when you translate continuous data (points, polygons, and so on) into H3. In that scenario, I can totally see congruency being a useful property for speed, and that H3 is probably slower than systems that are optimized for that conversion step.
  However, in most applications I’ve seen, the continuous-to-H3 conversion happens upstream, or at least isn’t the bottleneck. The primary task is usually operating on already "hexagonified" data, such as joins or other set operations on discrete cell IDs.
  Am I understanding the bottleneck correctly?
  
  4 replies →

TacticalCoder 20 days ago

> If joins are a critical performance-sensitive operation, the most important property of a DGGS is congruency.

Not familiar with geo stuff / DGGS. Is H3 not congruent because hexagons, unlike squares or triangles, do not tile the plane perfectly?

I mean: could a system using hexagons ever be congruent?

urschrei 20 days ago
Hexagons do tile the Euclidean plane perfectly. They are the largest of the three n-gons that do so.
- rockinghigh 20 days ago
  
  That's not true when tiling the Earth though. You need 12 pentagons to close the shape on every zoom level, you can't tile the Earth with just hexagons. That's also why footballs stitch together pentagons and hexagons.