Using Spark to Predict Churn

A binary classification problem solved with Spark and Spark's MLlib. Medium Post

Project Motivation

This project serves as an exploration of how to make a churn-prediction model using Spark, with the following steps included:

explore and manipulate our dataset
engineer relevant features for our problem
split data into train and test sets by sampling churn
build binary classifier models with Spark’s DataFrame-based MLlib
select and fine-tune the final model with Spark’s ML Pipelines and a StratifiedCrossValidator

Sparkify.ipynb
- A notebook contains all the analysis codes.
requirements.txt
- A file specifies what python packages are required to run this project.
.gitignore
- A file tells git which files/patterns it should ignore.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Sparkify.ipynb		Sparkify.ipynb
requirements.txt		requirements.txt