GitHub - SnehaDharne/BigDataAnalytics-MVCollisions: Leveraging NYC Open Data, this repository contains Databricks notebooks for analyzing motor vehicle collisions. We perform EDA, spatial clustering, and predictive modeling on collision, vehicle, and person datasets to understand accident trends and predict potential risks.

New York City experiences a high volume of motor vehicle collisions. This is a big data project leveraging NYC Open Data to analyze these collisions. This project dives into three key datasets: collisions, vehicles, and people involved. By analyzing these interconnected datasets, we aim to gain insights into various aspects of NYC traffic accidents, including :

▪ Accident Patterns (Factors) : Highlighting trends in accident times, vehicle types involved, pre-accident actions, location of the victim, contributing factors, etc

▪ Impact Analysis (Consequences): Understanding the types of public property damaged and harm to human life

▪ Spatial Distribution (Location Clustering): Examining collision distribution across boroughs, identifying potential hotspots and assigning weights.

▪ Predictive Modeling: Developing models to predict human life loss and injuries in high-risk areas.

This project aims to provide valuable data-driven insights to improve road safety and inform traffic management strategies in New York City.

NYC MOTOR VEHICLE COLLISION ANALYTICS AND PREDICTIVE MODELING Databricks notebooks

Initial data cleaning and EDA -
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2223093725023385/3402554983214854/563989775462406/latest.html
Data Wrangling -
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2223093725023385/1978338885242709/563989775462406/latest.html
Data Cleaning for Spatial CLustering -
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2202773089231587/3270097802115930/6397218320977099/latest.html
⁠⁠Features and target variable for predictive modeling -
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2202773089231587/288185156849501/6397218320977099/latest.html
Train Test Split and Join Data -
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2202773089231587/3538449047751016/6397218320977099/latest.html
Model Training -
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2223093725023385/3086432350858478/563989775462406/latest.html

NOTE

In the stages of preprocessing data, we had to displace our data from one user to the other, without getting rid of all the preprocessing as the joining and discretization were expensive operations. So, we utilized df.display() to get all the data and download it, transferred to another user and uploaded to their DBFS to continue with the subsequent steps in the preprocessing. Thus, at many points of our code, there have been calls to datasets other than the ones mentioned in our presentation.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
clustering-join-data.ipynb		clustering-join-data.ipynb
data-wrangling.ipynb		data-wrangling.ipynb
eda-data-cleaning.ipynb		eda-data-cleaning.ipynb
features-and-target-variables.ipynb		features-and-target-variables.ipynb
links-to-databricks-notebook.pdf		links-to-databricks-notebook.pdf
model-training.ipynb		model-training.ipynb
teamA-mv-collisions.pdf		teamA-mv-collisions.pdf
⁠data-cleaning-for-spatial-clustering.ipynb		⁠data-cleaning-for-spatial-clustering.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

SnehaDharne/BigDataAnalytics-MVCollisions

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages