This year, I took the Hubway visualization and created a version for Chicago, with a few improvements. You can view it here.
In short: This visualization shows the median amount of time it took a Divvy rider to ride between two stations (a station pair) compared with the same amount of time it would have taken to make the trip by public transportation (or walking, if that was faster than taking transport), for Divvy trips in 2014.
The screenshot above demonstrates that nearly all of the 5,489 Divvy trips to/from the Divvy station at Clark St & Leland Ave were faster than taking public transportation. We can infer some or all of the following:
- Bicycling in Chicago is usually faster than taking public transportation, even on Divvy’s heavy and somewhat slow bicycles
- Divvy riders are making trips that they would either
- Normally take via transportation, or by walking, and are therefore saving time, and/or
- Normally not make at all, but are now making because there is a new, faster option available
There are certainly other conclusions we can come to by looking at this data; please share your own!
Reading the Data
Since there is so much data, the website visualization starts off by displaying only the top 1,000 station pairs in terms of rides taken. Selecting “All Stations” from the drop-down list at the top will load all pairs, but it takes a while. Selecting individual stations is much faster, and more insightful since there are fewer data points.
The radius of each circle on the graph is a function of the amount of time saved in total (total trips × minutes saved).
A map is provided to show the start and end station for each pair. Those not familiar with all of Chicago’s street names will find it easier to read the data with a map. Small blue dots are Divvy stations, white dots are CTA ‘L’ stations, and the coloured lines are CTA ‘L’ lines. The silver-coloured lines represent multiple lines (e.g. the Loop).
Processing the Data
The process of getting to this point was long and involved a lot of data cleaning, and to help understand the data better, this was my process (or what I can remember from it):
- Clean Divvy’s trip data file in Excel (formatting dates correctly, etc).
- Import all of Divvy’s trip and station location data into a MySQL database.
- Run a database request to get the count of all trips between each possible station pair (there are a total of 58,087 pairs among the 300 stations)
- Write and run a PHP script that calls the Google Directions API to get both transit and bicycle directions for every station pair. These are the same directions that you get when you get directions from Google Maps, except the results are loaded into my database, instead of displayed on a screen. For the purposes of data consistency, transit trips are calculated for noon on a Monday. Bicycle trips are obtained to get the on-street distance between stations.
- Write and run another PHP script that gets the median Divvy trip duration for every station pair. While getting the average trip duration is faster thanks to a built-in MySQL function, in this case, the median is a better measure of central tendency. Divvy riders, especially 24-hour passholders (e.g. tourists), take longer trips than someone would on their own bicycle (according to Google, anyway). These outliers can impact the average trip duration, so the median was chosen instead.
- Export the data as JSON to be run by the D3 script. The D3 script will only display station pairs where the start station is not the same station as the end station and the amount of time “lost” by taking Divvy is greater than 10 minutes (that is, the “minutes saved” is greater than or equal to -10). I chose to not display trips where the time “lost” is 20, 30, or 40 minutes because these trips make up a very small amount of trips, and throw off the Y-axis.
A Few Observations
I did not edit the data to remove outliers because this is meant to show actual trips as taken by all Divvy riders. That’s why there are anomalies like this:
It would not take most bicyclists 13 minutes to bike one block. However, both stations are next to Millennium Park and Michigan Avenue, and we can assume that many of the riders using these stations are tourists who likely rode the bikes around a nearby park, went to the Lakefront Trail, etc. Therefore, the median trip time is about 10 minutes longer than it would be if we relied on Google’s bike directions, which say that this trip should take 3 minutes.
The map sometimes does not display both points on it. It is programmed to only show the two markers (green for the starting station, and red for the ending station) on the map, but does not always function properly. Hovering over other circles should “reset” the map so both markers are shown again.
In the future it would be interesting to see how the time saved changes between Divvy’s annual subscribers and 24-hour passholders, or at other times (e.g. peak hours).
The code is not so specific that it can be used only for Chicago. I would love to make one for New York City while it still has a relatively low number of stations (332), and therefore fewer possible station pairs, or for Washington, D.C. Were I to do one for Paris, for example, which has 1,230 Vélib’ stations, the total number of station pairs would exceed one million, and displaying this data would crash your browser.
The code is online at GitHub.
Please send constructive feedback and any errors you encounter to transitized [at] gmail [dot] com or tweet @transitized. Please leave your insights into the data in the comments for others to see!