This repository contains a comprehensive data analysis project focused on e-commerce transaction data, encompassing customer behavior analysis, lookalike modeling, and customer segmentation. The project leverages advanced analytics and machine learning techniques to derive actionable business insights.
- Performed detailed analysis of customer transactions, product preferences, and sales patterns
- Created comprehensive visualizations using seaborn and matplotlib
- Identified key business insights covering:
- Seasonal sales patterns and growth trends
- Category performance and revenue distribution
- Customer purchase behavior analysis
- Regional market dynamics
- Customer value concentration and lifetime value analysis
Implemented two distinct approaches for customer similarity:
-
Traditional Approach
- Utilized feature engineering for customer profile creation
- Implemented cosine similarity for customer matching
- Generated top-3 similar customer recommendations
-
FAISS Implementation
- Leveraged Facebook AI Similarity Search (FAISS) for efficient similarity search
- Optimized for high-dimensional feature spaces
- Achieved faster query times for large-scale customer base
- Implemented K-Nearest Neighbors (KNN) clustering
- Performed feature engineering to capture customer behavior
- Analyzed key metrics:
- Purchase frequency
- Average order value
- Total spend
- Category preferences
- Visualized cluster characteristics and distributions
EDA.ipynb
: Exploratory Data AnalysisLookalike_Model.ipynb
: Customer similarity modelingClustering.ipynb
: Clustering analysis- Three CSV files and one pdf
- Python
- Pandas & NumPy
- Scikit-learn
- FAISS
- Seaborn & Matplotlib
- Jupyter Notebook
The analysis revealed significant insights into customer behavior, sales patterns, and market dynamics. Detailed findings can be found in the respective notebook files and the final report (EDA report)
- Clone the repository
- Install required dependencies
- Run the Jupyter notebooks in sequence
pandas
numpy
scikit-learn
faiss-cpu
seaborn
matplotlib
jupyter