Show HN: I visualized the entire history of Citi Bike in the browser
1 day ago (bikemap.nyc)
Each moving arrow represents one real bike ride out of 291 million, and if you've ever taken a Citi Bike before, you are included in this massive visualization!
You can search for your ride using Cmd + K and your Citi Bike receipt, which should give you the time of your ride and start/end station.
Everything is open source: https://github.com/freemanjiang/bikemap
Some technical details: - No backend! Processed data is stored in parquet files on a Cloudflare CDN, and queried directly by DuckDB WASM
- deck.gl w/ Mapbox for GPU-accelerated rendering of thousands of concurrent animated bikes
- Web Workers decode polyline routes and do as much precomputation as possible off the main thread
- Since only (start, end) station pairs are provided, routes are generated by querying OSRM for the shortest path between all 2,400+ station pairs
Relevant callout from https://bikemap.nyc/about:
* Limitations *
The data only contains the start and end station for each trip, but does not contain the full path. Route geometries are computed for each (start station, end station) pair using the shortest path from OSRM.
This means that the computed routes are directionally correct but inexact. Trips that start and end at the same station are filtered out since the route geometry is ambiguous.
This limitation comes with more interesting implications: e.g., I noticed that some bike trips are noticeably slower than average. For those I’d assume that the rider either took a detour or made a stop in between. The animation, however, makes it appear as if it was a very slow ride. Maybe worth considering to filter out all rides that are essentially walking speed or slower.
It also would be interesting to learn how many rides had been excluded altogether, just to put things into perspective.
Yeah there is a filter between 1.2 and 20 mph
Hmm, definitely too bad. Essentially fictional
This is now top of my list as one of my favorite data visualizations I've ever seen. I remember spending some time with data for Capital Bikeshare data in DC, which was also public at one point, though looks like it only goes through 2016: https://capitalbikeshare.com/system-data. Would love to see the Lime/Bird version of this. Thanks for sharing.
Cool project. Thanks for sharing!
The link above points to a 404 error page on GitHub. Looks like you forgot the hyphen in the name part of the url.
I’m working with subway data, particularly the A subway line, 32 mi long with about 2million trips over 6 months across 66 stations. Trying to train a convlstm to learn the spatiotemporal propagation of train headways.
I really wish Lyft invested in maintenance. I used Citibike this week for the first time in about a year, and the Hudson River Greenway dock by NY Waterway had 1/3 of its empty docks broken with flashing red lights, then about 5 ebikes that needed service.
Are you sure that wasn't the "staggered" bike dock? It forces you to dock in the rear row if the neighboring two front row spaces are free. This is to fit more bikes. The blinking red docks aren't broken. They're intentionally unavailable.
https://www.reddit.com/r/MicromobilityNYC/comments/v457x0/9_...
Also, the 5 e-bikes probably didn't need "service", they were just waiting for battery swaps. This is by design. The docks don't charge them.
CitiBike maintenance is generally fine. They're not leaving any significant number of broken bikes or docks. I think you may have just misunderstood how it works.
Interesting that citibike publishes trip level data. The bike share schemes in Dublin only publish station counts or free bike locations. So you can see the overall pattern of bike motion, but there’s no way to see how many north side trips go to the docks vs Heuston station vs the city center.
All of the Lyft-operated systems in America publish this kind of data at least monthly.
non corrupted github link: https://github.com/freeman-jiang/bikemap.nyc
Cool visualization.
Do you find the OSRM shortest path routes probable for bikes? Not living in NYC, I expected pretty different paths. Say the "Hudson River Greenway" or whatever that's called.
This is awesome. I had no idea Lyft publishes ride data, time to explore the DC version!
This is just so cool! Not much more to add. Thanks a lot for sharing!! Great work :)
I've seen many visualizations of the citibike data over the years, this is one of the most charismatic for sure!
+1 to this comment! I used to work in this space and have similarly seen many projects and professional attempts at visualizing this kind of trip data.
This is beautifully done!
Thank you so much! That means a lot.
this is really nice. One request: when searching for a station name, let me type "and" instead of "&" e.g. typing "E 47th St and 2 Ave" would still return "E 47th & 2 Ave".
It says “entire history” but seems to start at Jan 1, 2025?
How was the data gathered? They just publicly show the bike's locations?
https://bikemap.nyc/about
That's cool it actually came from citibike
1 reply →
How is MapBox going for this free tool? Is it costing you money?
It definitely will if it blows up more. I'm willing to eat it for now because I think it's art that more people should see!
Is MapLibre GL a cheaper (free?) open source alternative?
Cool stuff btw. I’m trying to visualize weather model data myself (millions of points) at https://futureradar.net and have been researching client-side techniques like yours.
It is very cool art!
It's often interesting to observe the different ways that privacy is approached in the US and Europe.
In Europe we often accept pretty grave restrictions of our liberty like the UK's Online Safety Act, which would never fly in the US, and we do so without much public comment.
On the other side of things, organisations in the US happily expose datasets like this one, which would give a most EU Data Protection Officers a heart attack, and nobody bats an eyelid.
This data is mandated by NYC law: https://intro.nyc/local-laws/2015-99
I've heard that releasing these sorts of data sets help competitors do market research, and thus mitigates "winner takes all" forces. NYC also tends to be fairly pro-public-datasets: https://data.cityofnewyork.us/browse?%3BsortBy=most_accessed...
In Lyft's defense, they are providing it anonymized under the NYCBS Data Use Policy. They also aren't providing the exact GPS routes, which is why OSRM is used to calculate the shortest path instead.
I don't see anything problematic about start-end pairs from one public facility to another.
Awesome work!