I always appreciate the near Literate Programming completeness of Mark’s posts, and his consistency in that over years. Every post feels like one could spin up a clean VM and end up with exactly his results just by reading along.
The dataset claims there are significantly more Citibank locations than McDonalds worldwide which I don’t think can be correct?
It also lists over 56,000 Wildberries worldwide but a quick Google search shows they are a large online retailer. I wonder what is going on with the brand POIs…
There should be enough SQL in the blog to re-purpose extracting out the Wildberries locations and seeing where they land on top of. I've never heard of this firm before you mentioned it.
From Google:
> Citibank operates over 2,300 ATMs within more than 600 U.S. branches, with a total network of over 65,000 fee-free ATMs
So the 57,163 Citibank locations are probably a combination of their branches and ATMs.
Update: I reviewed Alltheplaces a while back, they scrape company websites for store locations. They reported 68,227 locations for Wildberries. ATP is one of the sources Overture use but they seem to use 1.55M of the records from their 19M-record dataset. https://tech.marksblogg.com/alltheplaces.html
I contribute to ATP and can confirm that the author of the wildberries spider was deliberately trying to collect https://wiki.openstreetmap.org/wiki/Tag:shop%3Doutpost (online order pickup locations). It's not a common occurrence within the current set of ATP spiders to capture such features. A quick search indicates that OSM doesn't appear to have tags designed to capture pickup/dropoff partnerships between retail brands, for example, an agreement from a pet supply shop to allow collection of parcels from select fuel stations of a partner brand. Thus I think the author of the wildberries spider has used shop=outpost as the closest tag available in OSM, and Overture Map's filters wouldn't be able to omit these features from their dataset unless Overture Maps adds wildberries to their exclusion list.
Ideally ATP's "located_in" and "located_in:wikidata" fields would be populated for these wildberries pickup locations, making it clear the pickup location is part of a parent feature (e.g. fuel station, supermarket). These fields are specific to ATP and are not OSM fields. OSM would expect features to be merged and a hypothetical field such as "pickup_brands:wikidata=Q1;Q2;Q3" be used instead on the parent feature.
ATP has a much more inclusive set of features it can extract than what Overture Maps, TomTom et al care about. As Overture Maps is more opinionated on what they aggregate they will filter out ATP extracted features such as individual power poles, park bench seats, local government managed street and park trees, stormwater drain manholes, cemetery plots, weather stations, tsunami buoys, etc. I think there might be some exceptions if it helps TomTom et al with their products such as speed camera locations, national postal provider drop-off/pick-up locations within other branded retail shops, etc.
I can't get the vector basemap to render in QGIS properly with any other projection. I love using EPSG:3301 for Europe, etc.. but loads of strange things happen. Even 4326 is an issue.
I always appreciate the near Literate Programming completeness of Mark’s posts, and his consistency in that over years. Every post feels like one could spin up a clean VM and end up with exactly his results just by reading along.
This was great, thank you for sharing.
The dataset claims there are significantly more Citibank locations than McDonalds worldwide which I don’t think can be correct?
It also lists over 56,000 Wildberries worldwide but a quick Google search shows they are a large online retailer. I wonder what is going on with the brand POIs…
Glad you enjoyed it.
There should be enough SQL in the blog to re-purpose extracting out the Wildberries locations and seeing where they land on top of. I've never heard of this firm before you mentioned it.
From Google:
> Citibank operates over 2,300 ATMs within more than 600 U.S. branches, with a total network of over 65,000 fee-free ATMs
So the 57,163 Citibank locations are probably a combination of their branches and ATMs.
Update: I reviewed Alltheplaces a while back, they scrape company websites for store locations. They reported 68,227 locations for Wildberries. ATP is one of the sources Overture use but they seem to use 1.55M of the records from their 19M-record dataset. https://tech.marksblogg.com/alltheplaces.html
I contribute to ATP and can confirm that the author of the wildberries spider was deliberately trying to collect https://wiki.openstreetmap.org/wiki/Tag:shop%3Doutpost (online order pickup locations). It's not a common occurrence within the current set of ATP spiders to capture such features. A quick search indicates that OSM doesn't appear to have tags designed to capture pickup/dropoff partnerships between retail brands, for example, an agreement from a pet supply shop to allow collection of parcels from select fuel stations of a partner brand. Thus I think the author of the wildberries spider has used shop=outpost as the closest tag available in OSM, and Overture Map's filters wouldn't be able to omit these features from their dataset unless Overture Maps adds wildberries to their exclusion list.
Ideally ATP's "located_in" and "located_in:wikidata" fields would be populated for these wildberries pickup locations, making it clear the pickup location is part of a parent feature (e.g. fuel station, supermarket). These fields are specific to ATP and are not OSM fields. OSM would expect features to be merged and a hypothetical field such as "pickup_brands:wikidata=Q1;Q2;Q3" be used instead on the parent feature.
ATP has a much more inclusive set of features it can extract than what Overture Maps, TomTom et al care about. As Overture Maps is more opinionated on what they aggregate they will filter out ATP extracted features such as individual power poles, park bench seats, local government managed street and park trees, stormwater drain manholes, cemetery plots, weather stations, tsunami buoys, etc. I think there might be some exceptions if it helps TomTom et al with their products such as speed camera locations, national postal provider drop-off/pick-up locations within other branded retail shops, etc.
A quick overpass-turbo search for "brand:wikidata=Q24933714 in Moscow" https://overpass-turbo.eu/s/2kaO (Q24933714 being Wildberries https://www.wikidata.org/wiki/Q24933714 ) reveals that almost all locations are tagged shop=outpost https://wiki.openstreetmap.org/wiki/Tag:shop%3Doutpost which identifies them as pick-up locations for goods ordered online. I assume the dataset in the post has mostly the same locations.
At least where I live citi and chase have 2x the number of locations than McD when you count their small branches and stand alone ATMs
I really enjoy the exploration but I really wonder why people still use Mercator for data viz? Eg Greenland is not the size of Africa...
I can't get the vector basemap to render in QGIS properly with any other projection. I love using EPSG:3301 for Europe, etc.. but loads of strange things happen. Even 4326 is an issue.
Your computer is a sweet machine
Thank you! This was a good intro to using duckdb for me with easily relatable content and cool maps.