Divvy Data Explorer is live

March 12, 2014 at 7:59 pm

After I created the Divvy Spokes visualization to show the start/end neighborhoods of 2013’s Divvy rides, I wanted to play around with the data a little more to see what kind of trends or patterns emerged. I’m used to using Excel pivot tables to filter data, but Excel wasn’t going to cut it with this much data. That’s where Crossfilter and dc.js came in handy.

dc.js (dimensional charting) is a Javascript library built off of Crossfilter.js and d3.js (the latter was used to make Divvy Spokes). Crossfilter basically enables taking huge amounts of data and creating dimensions that can be used to filter the data. dc.js then uses d3.js to display the charts on a website. This was my first foray into crossfilter and dc.js, so I’m pretty proud that I was able to get it up and running over a weekend (and with time to enter the Divvy Data Challenge). Enter the Divvy Data Explorer:



You can click and drag the small chart below the trips by day chart at the top to select a date range. You can also select any of the bars on the month, day of week, and duration of trip graphs, the wedges in the gender, age, and rider type charts, and click and drag on the time trip started graph to filter data. You can get very specific!

At first glance, you’ll notice a few patterns: September was a great month, when many stations had been installed and the weather was still nice. Many people take the bikes out during the 8:00 AM and 5:00 PM (17:00) hours, and there’s a yet-unexplained drop in rides on Sunday, 15th September (most Sundays in September were popular riding days, but the 15th was a rainy day). Most rides from midnight to 4 AM are by 24-hour passholders on early Friday, Saturday, or Sunday mornings. And so on. If you trap yourself in too many filters, click the red “Reset graphs” button to the left to start from scratch.

The 760,000+ Divvy trips in the dataset actually ended up being a bit much to handle. During development, I only selected a small subset of trips (around 10,000 at random) to make the graphs load quickly. With over 760,000 rows, however, loading all of the charts took about 20 seconds and filtering the data was not “fluid” – redrawing the charts to reflect the filtered data was slow.

I ended up combining the data by day, gender, age group, rider type, and trip duration (in 5-minute intervals) to drastically reduce the number of data points – from over 760,000 to just under 130,000. This makes the data load much faster and should make it easier to add more trip data as Divvy releases it, at least for the next quarter or two.

If you play around with the Divvy Data Explorer for a little, leave any interesting insights you come out with in the comments below. I’m also interested in what else could possibly be done to improve it. If you have ideas, leave them in the comments, too; If you know how to improve it and would like to contribute, check out the project’s GitHub repository.