Comment by maxxen

5 days ago

This is cool. My only worry is that the implementation complexity will prevent widespread adoption outside of maplibre. Although getting write support upstreamed into PostGIS might be all thats needed to make sure it trickles down into all the different tile servers. MVT is not the most efficient, but everything speaks protobuf and you can hack together a parser in an afternoon.

I've experimented a lot with vectorized encodings of geometries in DuckDB-spatial using the different nested types. You definitely do get very good compression out of the box if you already support a bunch of specialized lightweight compression algorithms. Simpler geometric properties are very fast to compute (e.g. area, length), but for anything more complex you usually need to do some pre-processing or conversion into an intermediate data structure (like creating a line-segment index for intersection checks, or a node graph for clipping) which dominates the processing time anyway. The cost of materializing the columnar format into a row-wise format and back again when doing joins or sorting is absolutely brutal on performance too, compared to just keeping geometries as serialized blobs that are easy to slice and memcpy.

That said, I do expect columnar encoding to work really well for rendering in the browser, where transfer speed is the big bottleneck. The paper mentions Arrow as an inspiration, but I wonder why the format isn't just based on (compressed) arrow in its entirety? Im not super up to speed on the arrow ecosystem but I know there's a couple of query engines that don't just use it internally on the CPU, but also to execute on the GPU. If you are going to decode and send over the data to WebGL, you might as well do the filtering/expression evaluation there too no? (and leverage the existing techniques/code/interop in the arrow world)

1 comment

maxxen

jandrewrogers 5 days ago

> My only worry is that the implementation complexity will prevent widespread adoption outside of maplibre.

I don't think the implementation is that complex. You may be underestimating the extent to which heavy users of mapping data already write their own informal bug-ridden versions of the transforms and representations standardized in this tile format. In fact, it is not uncommon for companies to actually be using multiple slightly incompatible implementations of these under the hood. The practical effect could be to actually reduce the amount of code being written to do this, never mind the compatibility bugs it would address.

A key caveat is that this format is explicitly optimized for visualization. It is not optimized for efficient geospatial or spatiotemporal analytics, which may not even have visualizable output in this sense. Formats optimized for analysis make a very different set of tradeoffs.