Versão em português (BR)
To increase the customers retention, a CEO of an E-commerce would like to start a new marketing campaign called "Insiders", this campaign aims to offer to potential loyal customers a special benefits plan, such as discounts, gifts, prizes, for those who reach a certain goal. This CEO asked his team of data scientists to identify the key customer segments the company has and which customers are most eligible for this campaign.
The study case refers to an online retail store based in UK, were collected invoices from approximately 5000 customers, for a period of one year (November 2016 to December 2017). Database can be found on Kaggle.
Main:
- Through Machine Learning algorithms, identify the most relevant customer segments that the company has.
- Identify which customers are most eligible to participate in the loyalty program.
- Publish the results into a dashboard that can be accessed from anywhere.
Secundary:
- Using the RFM metric (Recency, Frequency, Monetization) and compare the results of a statistical model and machine learning model.
Data Preprocessing
Dealing with missing, duplicated and bad values, fixing data types, feature engineering, data inputation...
Exploratory Data Analysis
Descriptive statistics, Cohort, Sales over time, Cancelations, Country of Customers, Rank of Customers, WorldCloud of Products...
Data Preparation
Outliers Detection, Normalization, Standardization, Dimensionality Reduction (PCA, UMAP, t-SNE, Tree Embedding).
Machine Learning Model
Kmeans, Hierachical Clustering, Gaussian Mixture Model, DBScan
Deploy
Metabase, AWS (RDS, EC2, S3), Crontab, Papermill, Postgres
Business Report - Loyalty Program
Comparison RFM - Machine Learning and Statistical
Languages: Python
IDE: Visual Studio Code, Jupyter Notebook
Libraries: Pandas, Matplotlib, Seaborn, Plotly, Sklearn, scipy, yellowbricks
Dashboard: Metabase
Deploy: AWS
Methodology: CRISP-DM