The dataset comes from a telecommunications company: https://drive.google.com/file/d/1dPCG76ST6NohYKtVMGv6HpFL-jD5p1eJ/view
The goal is to predict customer churn using Machine Learning.
pandas:
Data analysis and manipulation tool.
matplotlib:
Visualization library.
seaborn:
Data visualization library based on matplotlib, it enhances the style of matplotlib plots.
Numpy:
Numerical analysis library.
scikit-learn:
Machine Learning library.
XGBoost:
Decision-tree-based ensemble Machine Learning algorithm.
LightGBM:
Gradient boosting framework that uses tree based learning algorithms.
The first part comprises the cleaning process and a concise exploratory data analysis.
The second part describes the model definition process. Several models such as Random Forest, XGBoost, SVM and AdaBoost are used as base estimators and LightGBM as the final estimator in a stacked model.
The following diagram depicts the final model architecture.