This project applies K-Means clustering to segment customers of an online retail store based on purchasing behavior. The analysis includes data cleaning, feature engineering, and clustering, with the goal of identifying distinct customer groups to enhance business strategies.
Customer segmentation is a valuable process that helps businesses understand their customer base and tailor their services or marketing strategies accordingly. In this project, we cluster customers based on their total spend, quantity of products purchased, and the number of transactions made.
-
Data Cleaning:
- Handled missing data by forward-filling missing values.
- Removed rows where
CustomerID
was missing.
-
Feature Engineering:
- Created new features such as
Total Spend
by multiplying the quantity purchased and unit price. - Aggregated data by
CustomerID
to calculate total spend, total quantity, and the number of transactions.
- Created new features such as
-
Clustering:
- Standardized the features for clustering using
StandardScaler
. - Applied K-Means clustering to group customers into distinct segments.
- Used the Elbow Method to determine the optimal number of clusters.
- Standardized the features for clustering using
-
Visualization:
- Plotted customer clusters to understand the relationship between
Total Spend
andTotal Quantity
.
- Plotted customer clusters to understand the relationship between
-
Evaluation:
- Evaluated the clustering performance using the Silhouette Score.
- Pandas: For data manipulation and analysis.
- NumPy: For numerical operations.
- Scikit-learn: For clustering, scaling, and evaluation metrics.
- Matplotlib: For visualizations and plotting the clusters.
- Jupyter Notebooks/VS Code: For code execution and development.
To run this project locally:
-
Clone this repository:
git clone https://github.com/Mr-Ayushh/Customer-Segmentation-Using-K.git
-
Install the required libraries:
pip install pandas numpy scikit-learn matplotlib
-
Run the Python script:
python customer_segmentation.py
- Elbow Method: Determines the optimal number of clusters by plotting inertia against the number of clusters.
- Customer Segmentation: Clusters customers into distinct groups based on their purchasing behavior.
- Visualization: Provides a scatter plot to visually understand the distribution of customers across clusters.
The K-Means algorithm grouped customers into four clusters, revealing distinct patterns in customer behavior. Cluster analysis helps in understanding different types of customers, such as high-spend customers or frequent buyers, which can be useful for targeted marketing.
The clustering model achieved a Silhouette Score of X.XX
, indicating the cohesion and separation of the customer segments.
Feel free to submit pull requests or open issues for any improvements or bug fixes.
This project is licensed under the MIT License.