-
Notifications
You must be signed in to change notification settings - Fork 4
Home
This short project was built under the guidance of Packt & Team Epic with being Rehan Guha as our Mentor.
A common problem faced by most Telecom industries is to retain their customer base. We are working here to identify all such potential users based on some past customer's behavior that how likely they can stop using services. Eventually, this will help us to work on those features which are causing major dropouts so that we can work on those areas. We will do this by building predictive models and based on their results, we will be able to classify the customer type i.e. "CHURN" and "NOT CHURN".
This dataset is abstracted from Kaggle which is an open-source community for Data Science. An overview of that dataset can be segmented into various parts out of which some are Customer geographic & Account details, Customer plans, Customer calling activity, and Customer complaints.
Element | Description |
---|---|
Python | Programming language |
Numpy | Python library used for advanced Mathematical operations |
Pandas | Python library used for data manipulation & analysis |
Matplotlib and Seaborn | Python library used for data visualization |
Sklearn | Python library used for data pre-processing and modeling |
pickle | Python library used for saving model |
Requests | Python library used for making HTTP requests |
Flask | Used for making web applications and creating API |
HTML CSS | Basic language for website designing |
Bootstrap | CSS framework used to create interactive website |
Heroku | Cloud Platform used for deploying data models |
1. Data Dictionary: It explains each feature's information and its relevance to the domain.
2. IDA: We performed initial data analysis to summarize and visualize the data at different levels.
3. EDA: Exploratory data analysis was performed to gain useful insights from the data by summarizing their main characteristics through statistical graphs and various data visualization methods.
4. Data Preparation: In this step, we handled all the missing values through a statistical approach. Also, due importance was given to the outliers present inside the data. At last, we performed data standardization for continuous features to get the data ready for model building.
5. Data Modeling: Once all the above steps are carried away, the next thing that comes up is to test the data with different classification algorithms. In our case, we have used three classification algorithms which consist of Logistic Regression, Decision Tree, and Random forest. We followed the pipeline approach to fit all models. In this approach, a single pipeline was created for each model testing and each pipeline had been incurred with 3 parameters which were performing PCA followed by Cross-Validation through GridSearchCV and algorithm to be used. Random forest was the best fit for this data as it was giving better accuracy and also good evaluation metrics in terms of performance.
6. Pickling: Here, the Random forest model was saved as a pickle file.
7. API development: Created an API using Flask. This API can be used by the user to use the built model.
8. Data testing: We have tested the data points individually as well as giving bulk data at a time. The results were performing accurately in terms of their accuracy and other metrics for both cases.
9. UI Development: UI created using HTML,CSS & Bootstrap.
10. Deployment: Backend created using flask web framework. In the backend, we have used flask API which is used for managing HTTP requests. We used the data input from the user and displayed the results in the front end.
Based on the different models testing, we have evaluated their classification reports at training level
- Logistic Regression:
- Decision Tree:
- Random Forest: