Skip to content

Data science project investigating the factors affecting delays in TTC buses, streetcars and the subway for SDSS Datathon 2025

Notifications You must be signed in to change notification settings

lucieyang1/Datathon-TTC

Repository files navigation

Datathon-TTC

Overview

This project investigates delays in the 2024 Toronto Transit Commission (TTC) system, including buses, streetcars, and subway lines. The goal is to predict future delays and analyze contributing factors through data cleaning, wrangling, visualization, and modeling.

Repository Structure

Datathon-TTC/
├── data/
│   ├── raw/                     # original data (provided by organizers and external data)
│   └── cleaned/                 # cleaned data (output of data_preprocessing.Rmd)
├── data_preprocessing.Rmd       # File for cleaning and preprocessing data
├── visualization.Rmd            # File for more cleaning, and mapping and visualization
├── visualization.html           # Knitted version of visualizations, interactive!!
├── incidence_analysis.Rmd       # File for analyzing top incidence types

External Data

1. TTC Stations Coordinate Data from Esri Canada Education

  • Source: Esri Canada Education - TTC Stations Coordinates
  • License/terms of use: Data is public and no special restrictions or limitations specified.
  • Preprocessing steps: Preprocessed in ArcGIS Pro to extract XY coordinates of subway stations (specific steps documented in mapping.Rmd).
  • Justification: Coordinates were needed to map the data for visualization.

2. TTC Subway Lines from Toronto Open Data

3. Daily Data Report from Environment and Climate Change Canada

  • Source: Environment and Climate Change Canada - Toronto City Daily Data Report
  • License/terms of use: The data is publicly available through Environment and Climate Change Canada with no restrictions on usage for non-commercial purposes.
  • Preprocessing steps: Joined data to the TTC data (used in data_cleaning.R).
  • Justification: The daily climate data is helpful for understanding the environmental conditions around the TTC stations, helping to analyze potential relationships between weather patterns and transit delays.

Other Resources

AI Tool Usage

We used ChatGPT for code assistance and debugging.

About

Data science project investigating the factors affecting delays in TTC buses, streetcars and the subway for SDSS Datathon 2025

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •