Skip to content

lee-data/TTC-streetcar-delay

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

178 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TTC Streetcar Delay Prediction - A Data Science Approach

To watch the narrated video, please click the Youtube link here or on the image below.

TTC Streetcar Delay - A Data Science Approach

INTRODUCTION

Delays in public transit can disrupt daily routines and impact customer satisfaction. To address this, we analyzed TTC streetcar delay data from January 2023 to September 2024, applying machine learning techniques to predict delay types and provide actionable insights.

Our goal is to classify TTC streetcar delays into short, normal, or long delay categories to better understand delay patterns and help optimize operations. Predictors used for the calculations include:

  • Day of week
  • Holiday
  • Season
  • Time of day
  • Line
  • Location
  • Bound
  • Vehicle
  • Incident type

CHALLENGES

Challenges

However, challenges such as measurement error and recall bias were observed. Exploratory data analysis revealed significant clusters at exact 10-minute intervals with dips in the minutes between, suggesting potential recall bias. Additionally, significant outliers were observed beyond the 1-hour delay mark, extending up to 15 hours.

DATA PREPROCESSING

We worked with about 4,400 one-hot encoded features derived from delay records. Data pre-processing involved the removal of null and missing values, as well as stratified sampling, class weight balancing, and dimensionality reduction. This included utilizing the feature importance algorithm derived from random forest, applying principal component analysis (PCA), and testing uniform manifold approximation and projection (UMAP).

PREDICTIVE MODELS

PREDICTIVE MODELS

We explored seven predictive models optimizing for balanced accuracy. The random forest classifier, XG boost classifier, and neural network were applied to various transformed data sets. The ensemble bagging method with PCA emerged as the top performer, while other models were more effective at identifying the majority class but struggled to detect the minority classes.

PROTOTYPE

To make our findings actionable, we developed an interactive web application hosted on Render, allowing users to predict delay types based on selected features.

PROTOTYPE

INSIGHTS

INSIGHTS INSIGHTS

Here are key insights from our data analysis highlighting critical patterns and trends in streetcar delays.

  • Top 10 features importance: Incident-related features like diversion and mechanical issues are the most influential in predicting delay types, along with key routes such as lines 512 and 506.

  • Top 10 incidents: Diversions lead delay causes with 931 hours annually, followed by operational incidents, underscoring areas for improvement.

  • Line and line type: The Queen line sees the highest delays, while regular service lines account for 94% of total delay hours, making them a priority for optimization.

  • Line names: Lines like 501, 504, 505, and 506 appear frequently, confirming their critical role in addressing delays.

  • Delay hotspots: King and Church and Dundas West station are the top delay hotspots.

  • Time of day: Off-peak hours accumulate to the most delay hours at 1,820 hours annually.

RECOMMENDATIONS

Key recommendations include:

  • Addressing measurement errors and recall bias.
  • Prioritizing the dominant features contributing most to delays.
  • Leveraging these predictions for continuous improvements in operational efficiency.

PROJECT TEAM

PROJECT TEAM

Meet the team behind this project:

  • Jay Menorca: GitHub, extract load transform (ELT), and DevOps.
  • Ly Nguyen: Data pre-processing, exploratory analysis (EDA), machine learning models, visualization, insights, and interactive web app prototype.
  • XiaoXiao Gong: Descriptive analytics, visualization, and actionable insights.
  • Shruti Patil: Tableau interactive dashboards.

Video Editing: Ly Nguyen


Acknowledgement: This project has been made possible thanks to the open data initiative of Toronto Transit Commission (TTC) and the support of The University of Toronto - Data Sciences Institute.

About

UOT-DSI Cohort 4 - Team 24's Project Repo

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 99.1%
  • Python 0.9%