Skip to content

OmarFaig/eCommerceAnalytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›’ Olist E-Commerce Customer Intelligence

End-to-end analytics portfolio project β€” transforming 99k Brazilian e-commerce orders into actionable business intelligence using Python, Pandas, Plotly, and Streamlit.

Python Streamlit Plotly Pandas


πŸ“Έ Dashboard Preview - Streamlit app -> https://ecommerceanalytics-fmxbstdwrxkesvrzmc8u2b.streamlit.app/

The dashboard ships with 7 fully interactive pages:

Page What you'll find
πŸ“Š Overview KPI scorecards + revenue trend + segment distribution + 6 key insights
πŸ“ˆ Revenue & Trends Monthly revenue/orders, hourly & day-of-week heatmaps, payment mix
πŸ“¦ Product Categories Top 15 categories by revenue & AOV, bubble chart
🚚 Delivery Performance Status breakdown, review scores per status, violin plot, star distribution
πŸ—ΊοΈ Geographic Analysis State-level revenue bar chart, share pie, sortable data table
🎯 RFM Segmentation Customer scatter map, treemap, segment deep-dive table
πŸ”„ Cohort Analysis Retention heatmap, average retention curve

Revenue & Order Growth

Revenue Trend

Top Product Categories

Top Categories

Delivery Impact on Satisfaction

Delivery Impact

Geographic Revenue Distribution

Geographic

RFM Customer Segmentation

RFM Scatter

Cohort Retention Analysis

Cohort Heatmap


πŸ—‚οΈ Project Structure

eCommerceAnalytics/
β”‚
β”œβ”€β”€ app.py                        # Streamlit dashboard (entry point)
β”‚
β”œβ”€β”€ data/                         # Raw Olist CSV files (not committed β€” see below)
β”‚   β”œβ”€β”€ olist_customers_dataset.csv
β”‚   β”œβ”€β”€ olist_orders_dataset.csv
β”‚   β”œβ”€β”€ olist_order_items_dataset.csv
β”‚   β”œβ”€β”€ olist_order_payments_dataset.csv
β”‚   β”œβ”€β”€ olist_order_reviews_dataset.csv
β”‚   β”œβ”€β”€ olist_products_dataset.csv
β”‚   β”œβ”€β”€ olist_sellers_dataset.csv
β”‚   β”œβ”€β”€ olist_geolocation_dataset.csv
β”‚   └── product_category_name_translation.csv
β”‚
β”œβ”€β”€ processed/                    # Pre-computed Parquet files (output of notebooks)
β”‚   β”œβ”€β”€ master.parquet            # 99,441 orders β€” all tables joined + features
β”‚   β”œβ”€β”€ delivered.parquet         # 96,478 delivered orders only
β”‚   └── rfm.parquet               # 93,358 customers with RFM scores & segments
β”‚
β”œβ”€β”€ data_setup.ipynb              # ETL: load, clean, join, feature engineer
β”œβ”€β”€ eda.ipynb                     # Exploratory data analysis
β”œβ”€β”€ RFM_seg_coh.ipynb             # RFM scoring, segmentation & cohort analysis
β”‚
└── myvenv/                       # Python virtual environment

πŸ“Š Dataset

Source: Olist Brazilian E-Commerce Public Dataset (Kaggle)

Table Rows Description
Orders 99,441 Transaction header β€” timestamps, status, delivery dates
Order Items 112,650 Line items β€” products, prices, freight
Payments 103,886 Payment method and value (multiple per order)
Reviews 99,224 Customer ratings (1–5 stars)
Customers 99,441 Demographics β€” state, city
Products 32,951 Catalog β€” categories, dimensions, weight
Sellers 3,095 Seller information
Geolocation 1,000,163 Lat/long mapping for CEP codes

Coverage: September 2016 β†’ October 2018 Β· R$ 13.6M total revenue Β· 96k unique customers


πŸ” Key Findings

1. Delivery Drives Satisfaction β€” Dramatically

Late deliveries average 1.86 β˜… versus 4.31 β˜… for early deliveries β€” a 2.5Γ— gap. With only 4.9% of orders arriving late, logistics improvements have an outsized impact on platform reputation.

2. Geographic Concentration Risk

SΓ£o Paulo + Rio de Janeiro generate 54% of total revenue. States like Minas Gerais, Bahia, and ParanΓ‘ are underserved relative to their population β€” clear expansion targets.

3. Low Repeat-Purchase Rate

Month 1 retention averages just ~10%. Typical of a marketplace (vs. subscription), but even a 5-point improvement would materially lift customer lifetime value.

4. Seasonal Spikes Require Surge Capacity

November 2017 (Black Friday) was the single strongest month β€” 3Γ— the average. Q4 logistics planning is critical to avoid late deliveries exactly when acquisition is highest.

5. Champions Are Rare and Valuable

Only 7% of customers qualify as Champions, yet they generate 13% of revenue. Retention spend on this cohort delivers the highest marginal ROI.

6. Optimal Engagement Window

Peak purchasing occurs Monday–Tuesday at 20:00–21:00 local time. Email and push campaigns timed to this window will see higher conversion rates.


πŸ—οΈ Technical Approach

ETL Pipeline (data_setup.ipynb)

  • Loaded and audited 8 source tables with 1M+ rows
  • Aggregated payments (multi-payment orders β†’ single row)
  • Deduplicated reviews (kept most recent per order)
  • Engineered 7 analytical features: delivery_days, delivery_delta, delivery_status, purchase_month/year/dayofweek/hour, main_category
  • Persisted to Parquet for fast downstream reads

Exploratory Data Analysis (eda.ipynb)

  • Revenue trend analysis with dual-axis monthly view
  • Category performance matrix (revenue, volume, AOV)
  • Delivery status impact on customer ratings
  • State-level geographic breakdown
  • Purchase timing patterns (hourly + day-of-week)

RFM Segmentation & Cohort Analysis (RFM_seg_coh.ipynb)

  • Scored 93k customers on Recency, Frequency, Monetary (quintile 1–5 scale)
  • Classified customers into 8 actionable segments using RFM thresholds
  • Built 23 Γ— 20 cohort retention matrix with month-over-month retention rates

πŸš€ Getting Started

Prerequisites

  • Python 3.10+
  • The Olist dataset downloaded from Kaggle into data/

Installation

# Clone the repository
git clone https://github.com/YOUR_USERNAME/eCommerceAnalytics.git
cd eCommerceAnalytics

# Create and activate virtual environment
python -m venv myvenv
source myvenv/bin/activate        # macOS/Linux
myvenv\Scripts\activate           # Windows

# Install dependencies
pip install -r requirements.txt

Run the Notebooks (optional β€” Parquet files already included)

jupyter notebook data_setup.ipynb     # Step 1 β€” ETL
jupyter notebook eda.ipynb            # Step 2 β€” EDA
jupyter notebook RFM_seg_coh.ipynb    # Step 3 β€” Segmentation

Launch the Dashboard

streamlit run app.py

Then open http://localhost:8501 in your browser.


πŸ“¦ Dependencies

pandas
numpy
plotly
streamlit
pyarrow          # Parquet I/O
jupyter

πŸ’Ό About This Project

This project was built as a data analytics portfolio piece to demonstrate:

  • Data Engineering β€” joining and cleaning multi-table relational datasets at scale
  • Exploratory Analysis β€” deriving business insights from raw transactional data
  • Customer Analytics β€” RFM segmentation and cohort retention modelling
  • Data Visualisation β€” interactive Plotly charts with a production-quality Streamlit UI
  • Storytelling β€” translating analysis into prioritised, actionable business recommendations

πŸ“¬ Contact

Feel free to connect or reach out if you have questions about the methodology or want to discuss the findings.

LinkedIn GitHub

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors