Skip to content

E-commerce customer clustering problem and comparison of the statistical and machine learning model

Notifications You must be signed in to change notification settings

alyssonvidal/E-Commerce-Clusterization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Versão em português (BR)

Customers Segmentation - Loyalty Program

logo_ecommerce

Problem Statement

To increase the customers retention, a CEO of an E-commerce would like to start a new marketing campaign called "Insiders", this campaign aims to offer to potential loyal customers a special benefits plan, such as discounts, gifts, prizes, for those who reach a certain goal. This CEO asked his team of data scientists to identify the key customer segments the company has and which customers are most eligible for this campaign.

The study case refers to an online retail store based in UK, were collected invoices from approximately 5000 customers, for a period of one year (November 2016 to December 2017). Database can be found on Kaggle.

Objective

Main:

  • Through Machine Learning algorithms, identify the most relevant customer segments that the company has.
  • Identify which customers are most eligible to participate in the loyalty program.
  • Publish the results into a dashboard that can be accessed from anywhere.

Secundary:

  • Using the RFM metric (Recency, Frequency, Monetization) and compare the results of a statistical model and machine learning model.

Development Stages

Data Preprocessing
Dealing with missing, duplicated and bad values, fixing data types, feature engineering, data inputation...

Exploratory Data Analysis
Descriptive statistics, Cohort, Sales over time, Cancelations, Country of Customers, Rank of Customers, WorldCloud of Products...

Data Preparation
Outliers Detection, Normalization, Standardization, Dimensionality Reduction (PCA, UMAP, t-SNE, Tree Embedding).

Machine Learning Model
Kmeans, Hierachical Clustering, Gaussian Mixture Model, DBScan

Deploy
Metabase, AWS (RDS, EC2, S3), Crontab, Papermill, Postgres

asasd

Results

Business Report - Loyalty Program
Comparison RFM - Machine Learning and Statistical

Tools

Languages: Python
IDE: Visual Studio Code, Jupyter Notebook
Libraries: Pandas, Matplotlib, Seaborn, Plotly, Sklearn, scipy, yellowbricks
Dashboard: Metabase
Deploy: AWS
Methodology: CRISP-DM

Resume

clusters

Releases

No releases published

Packages

No packages published