Charel Theisen

MSc Data Science Student

Exploring New York’s taxi trips to reduce traffic congestion


Taxis and especially Yellow Cabs are an important part of the city of New York. Since the early 20th century already Yellow Cabs are driving in the city (NPR, 2007). In May 2011, Uber announced that it will also start operating in New York (Uber, 2011). However, like many other cities, New York is currently facing severe traffic congestion. This year, New York’s mayor released a plan to reduce problems related to car traffic (Nir, 2017). To improve urban mobility, it is really important to understand when and how people travel within the city. Therefore, New York’s transportation companies have to publish their data (Flegenheimer, 2015).
Public transportation datasets can help to understand urban mobility and help to plan transit service, and thus a city can improve their bus routes or add bicycle lanes if necessary (Li, 2016). This report, analyses Yellow Cab data from January to March 2015 which is available at the New York City official homepage (NYC, 2015). In addition, a dataset from Uber covering the same period is used to have a more integral understanding of New York’s taxi landscape (FiveThirtyEight, 2016). The Yellow Cab dataset provides much more information than the one from Uber, which is why this report mainly focuses on the Yellow Cab dataset. The Uber dataset only highlights pickup dates and location, whereas the Yellow Cab dataset delivers more detailed information about pick-ups and drop-offs. The latter also reveals details about how many passengers travelled in a taxi. Therefore, the focus lays on the Yellow Cab data, but the Uber dataset is used as an add-on to compare both services. While the raw Yellow Cab dataset has over 38 million trips, Uber has nearly 6.5 million rides in the same period.
Taxi data analysis can help to improve the lives of citizens, political decisions and policy- (Ferreira et al, 2013). However, due to the size and complexity of the data, it is hard to perform comparative analyses. The goal of this report is therefore to get an overview of the behaviour of taxi trip in New York. By answering the questions when and where traffic occurs, it is pos- sible to find interactions between neighbourhoods. Consequently, this analysis detects where most traffic occurs and which routes are used most often. These insights reveal places where traffic caused by taxis can be reduced. Often, a high use of taxis implies poor public transport (Jiang et al, 2015). Therefore, the data helps to discover where this could be improved and thus reduce traffic congestion in cities.
The first part of this report investigates New York’s traffic behaviour using taxis’ GPS data. In a next step, more complex visualisation maps the data to highlight typical routes. Insights de- liver details where and when the most pickups take place. The dataset was also merged with weather data (NOAA, 2017). The weather data helps to understand if there are any relationships between taxi rides and weather conditions. To map the coordinates of the pick-ups and drop- offs, the dataset was furthermore merged with census data from New York (FCC, 2017). This dataset has the coordinates with the corresponding neighbourhood names.

This report was a Visual Analytics coursework during my MSc Data Science. Let me know if you want to read the entire report.


Your email address will not be published. Required fields are marked *