This project aims to predict whether a user is a 5G user based on their basic information and communication-related data such as call charges, data usage, activity behavior, package type, and region information. The project uses various machine learning algorithms to build predictive models and optimize their performance.
During the 2022 World Internet Conference Wuzhen Summit, the "World Internet Development Report 2022" indicated that in the first quarter of 2022, the global number of 5G users increased by 70 million, reaching approximately 620 million. According to Ericsson and GSMA, the global number of 5G users is expected to exceed 1 billion by the end of 2022. For telecom operators, it is helpful to create user profiles based on user-side information and make precise marketing efforts for potential 5G users.
The dataset contains 60 fields, with the target
field being the prediction target. The main feature fields are divided into two categories: categorical (cat) and numerical (num) features. The dataset is available on the cloud platform, named train.csv
.
id
: Sample identifiercat_0
tocat_19
: Categorical featuresnum_0
tonum_37
: Numerical featurestarget
: Target field, indicating whether the user is a 5G user
The performance of the models is evaluated using the AUC (Area Under the Curve) metric. The higher the AUC score, the better the model performance.
- Adaboosting
- Decision Tree
- K-means Clustering
- KNN with FAISS
- LightGBM
- Logistic Regression
- Naive Bayes
- Random Forest
- XGBoost
- Grid Search: Used in
KNN_Grid Search.ipynb
,LightGBM_Grid Search.ipynb
, andXGBoost_Hyperparameter Search.ipynb
. - K Neighbors Change: Used in
KNN_K_change.ipynb
,KNN_K_n_neighbors_change.ipynb
, andKNN_n_neighbors_change.ipynb
. - Bayesian Optimization: Used in
LightGBM_Bayes.ipynb
andXGBoost_Bayes_3para.ipynb
. - Particle Swarm Optimization (PSO): Used in
LightGBM_PSO.ipynb
. - Feature Crosses: Used in
XGBoost_Feature Crosses.ipynb
. - Random Search with Early Stopping: Used in
XGBoost_Random Search_Early Stopping.ipynb
.
5Gpredict_Adaboosting.ipynb 5Gpredict_DecisionTree.ipynb 5Gpredict_K-means.ipynb 5Gpredict_KNN_FAISS.ipynb 5Gpredict_LightGBM.ipynb 5Gpredict_LogisticRegression.ipynb 5Gpredict_NaiveBayes.ipynb 5Gpredict_Randomforest.ipynb 5Gpredict_XGBoost.ipynb KNN_Grid Search.ipynb
- KNN_K_change.ipynb
- KNN_K_n_neighbors_change.ipynb
- KNN_n_neighbors_change.ipynb
- LightGBM_Bayes.ipynb
- LightGBM_Grid Search.ipynb
- LightGBM_PSO.ipynb
- XGBoost_Feature Crosses.ipynb
- XGBoost_Hyperparameter Search.ipynb
- XGBoost_Bayes_3para.ipynb
- XGBoost_Random Search_Early Stopping.ipynb
train.csv (not included in this repository)
-
Clone the repository:
git clone https://github.com/F4nc1est/5GUserPrediction.git
-
Navigate to the project directory:
cd 5GUserPrediction
-
Run the Jupyter notebooks to see the modeling process and results:
jupyter notebook
The detailed results and analysis of each model, including performance comparisons and optimization effects, can be found in the respective Jupyter notebooks. The analysis includes the reasoning behind model selection, data analysis, model comparison, and potential improvement suggestions.
We welcome contributions to this project. Please create a pull request or open an issue to discuss any changes.
This project is licensed under the MIT License. See the LICENSE file for details.