Skip to content

Leveraging NYC Open Data, this repository contains Databricks notebooks for analyzing motor vehicle collisions. We perform EDA, spatial clustering, and predictive modeling on collision, vehicle, and person datasets to understand accident trends and predict potential risks.

Notifications You must be signed in to change notification settings

SnehaDharne/BigDataAnalytics-MVCollisions

Repository files navigation

New York City experiences a high volume of motor vehicle collisions. This is a big data project leveraging NYC Open Data to analyze these collisions. This project dives into three key datasets: collisions, vehicles, and people involved. By analyzing these interconnected datasets, we aim to gain insights into various aspects of NYC traffic accidents, including :

▪ Accident Patterns (Factors) : Highlighting trends in accident times, vehicle types involved, pre-accident actions, location of the victim, contributing factors, etc

▪ Impact Analysis (Consequences): Understanding the types of public property damaged and harm to human life

▪ Spatial Distribution (Location Clustering): Examining collision distribution across boroughs, identifying potential hotspots and assigning weights.

▪ Predictive Modeling: Developing models to predict human life loss and injuries in high-risk areas.

This project aims to provide valuable data-driven insights to improve road safety and inform traffic management strategies in New York City.

NYC MOTOR VEHICLE COLLISION ANALYTICS AND PREDICTIVE MODELING Databricks notebooks

  1. Initial data cleaning and EDA -
    https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2223093725023385/3402554983214854/563989775462406/latest.html
  2. Data Wrangling -
    https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2223093725023385/1978338885242709/563989775462406/latest.html
  3. Data Cleaning for Spatial CLustering -
    https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2202773089231587/3270097802115930/6397218320977099/latest.html
  4. ⁠⁠Features and target variable for predictive modeling -
    https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2202773089231587/288185156849501/6397218320977099/latest.html
  5. Train Test Split and Join Data -
    https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2202773089231587/3538449047751016/6397218320977099/latest.html
  6. Model Training -
    https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2223093725023385/3086432350858478/563989775462406/latest.html

NOTE

In the stages of preprocessing data, we had to displace our data from one user to the other, without getting rid of all the preprocessing as the joining and discretization were expensive operations. So, we utilized df.display() to get all the data and download it, transferred to another user and uploaded to their DBFS to continue with the subsequent steps in the preprocessing. Thus, at many points of our code, there have been calls to datasets other than the ones mentioned in our presentation.

teamA-mv-collisions-images-0

teamA-mv-collisions-images-5

teamA-mv-collisions-images-9 teamA-mv-collisions-images-11 teamA-mv-collisions-images-12 teamA-mv-collisions-images-13 teamA-mv-collisions-images-15 teamA-mv-collisions-images-16

teamA-mv-collisions-images-17 teamA-mv-collisions-images-19

teamA-mv-collisions-images-20 teamA-mv-collisions-images-22

teamA-mv-collisions-images-24

teamA-mv-collisions-images-34

teamA-mv-collisions-images-36

teamA-mv-collisions-images-39 teamA-mv-collisions-images-40 teamA-mv-collisions-images-42

teamA-mv-collisions-images-47 teamA-mv-collisions-images-48

About

Leveraging NYC Open Data, this repository contains Databricks notebooks for analyzing motor vehicle collisions. We perform EDA, spatial clustering, and predictive modeling on collision, vehicle, and person datasets to understand accident trends and predict potential risks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published