This project investigates delays in the 2024 Toronto Transit Commission (TTC) system, including buses, streetcars, and subway lines. The goal is to predict future delays and analyze contributing factors through data cleaning, wrangling, visualization, and modeling.
Datathon-TTC/
├── data/
│ ├── raw/ # original data (provided by organizers and external data)
│ └── cleaned/ # cleaned data (output of data_preprocessing.Rmd)
├── data_preprocessing.Rmd # File for cleaning and preprocessing data
├── visualization.Rmd # File for more cleaning, and mapping and visualization
├── visualization.html # Knitted version of visualizations, interactive!!
├── incidence_analysis.Rmd # File for analyzing top incidence types
- Source: Esri Canada Education - TTC Stations Coordinates
- License/terms of use: Data is public and no special restrictions or limitations specified.
- Preprocessing steps: Preprocessed in ArcGIS Pro to extract XY coordinates of subway stations (specific steps documented in mapping.Rmd).
- Justification: Coordinates were needed to map the data for visualization.
- Source: TTC Subway Shapefiles - City of Toronto Open Data Portal
- License/terms of use: Data is public and under a Open Government Licence, allowing use under few conditions.
- Preprocessing steps: No preprocessing steps, used in mapping.Rmd.
- Justification: For visualization purposes, to add context to our map.
- Source: Environment and Climate Change Canada - Toronto City Daily Data Report
- License/terms of use: The data is publicly available through Environment and Climate Change Canada with no restrictions on usage for non-commercial purposes.
- Preprocessing steps: Joined data to the TTC data (used in data_cleaning.R).
- Justification: The daily climate data is helpful for understanding the environmental conditions around the TTC stations, helping to analyze potential relationships between weather patterns and transit delays.
- Rush Hour from TTC
- Holidays from City of Toronto
We used ChatGPT for code assistance and debugging.