Skip to content
Akash20x edited this page Sep 5, 2021 · 8 revisions

Telecom Customer Churn Prediction

This short project was built under the guidance of Packt & Team Epic with being Rehan Guha as our Mentor.

Problem Statement

A common problem faced by most Telecom industries is to retain their customer base. We are working here to identify all such potential users based on some past customer's behavior that how likely they can stop using services. Eventually, this will help us to work on those features which are causing major dropouts so that we can work on those areas. We will do this by building predictive models and based on their results, we will be able to classify the customer type i.e. "CHURN" and "NOT CHURN".

Dataset

This dataset is abstracted from Kaggle which is an open-source community for Data Science. An overview of that dataset can be segmented into various parts out of which some are Customer geographic & Account details, Customer plans, Customer calling activity, and Customer complaints.

Technology used

Element Description
Python Programming language
Numpy Python library used for advanced Mathematical operations
Pandas Python library used for data manipulation & analysis
Matplotlib and Seaborn Python library used for data visualization
Sklearn Python library used for data pre-processing and modeling
pickle Python library used for saving model
Requests Python library used for making HTTP requests
Flask Used for making web applications and creating API
HTML CSS Basic language for website designing
Bootstrap CSS framework used to create interactive website
Heroku Cloud Platform used for deploying data models

Process Flow

1. Data Dictionary: It explains each feature's information and its relevance to the domain.

2. IDA: We performed initial data analysis to summarize and visualize the data at different levels.

3. EDA: Exploratory data analysis was performed to gain useful insights from the data by summarizing their main characteristics through statistical graphs and various data visualization methods.

4. Data Preparation: In this step, we handled all the missing values through a statistical approach. Also, due importance was given to the outliers present inside the data. At last, we performed data standardization for continuous features to get the data ready for model building.

5. Data Modeling: Once all the above steps are carried away, the next thing that comes up is to test the data with different classification algorithms. In our case, we have used three classification algorithms which consist of Logistic Regression, Decision Tree, and Random forest. We followed the pipeline approach to fit all models. In this approach, a single pipeline was created for each model testing and each pipeline had been incurred with 3 parameters which were performing PCA followed by Cross-Validation through GridSearchCV and algorithm to be used. Random forest was the best fit for this data as it was giving better accuracy and also good evaluation metrics in terms of performance.

6. Pickling: Here, the Random forest model was saved as a pickle file.

7. API development: Created an API using Flask. This API can be used by the user to use the built model.

8. Data testing: We have tested the data points individually as well as giving bulk data at a time. The results were performing accurately in terms of their accuracy and other metrics for both cases.

9. UI Development: UI created using HTML,CSS & Bootstrap.

10. Deployment: Backend created using flask web framework. In the backend, we have used flask API which is used for managing HTTP requests. We used the data input from the user and displayed the results in the front end.

CLASSIFICATION REPORTS

Based on the different models testing, we have evaluated their classification reports at training level

  • Logistic Regression:

Training Classification report2

  • Decision Tree:

Training Classification report2

  • Random Forest:

Training Classification report3

Clone this wiki locally