Ecommerce Data Model and Analysis

Dataset

The dataset has been anonymized and comprises a subset of more than 3 million grocery orders placed by over 200,000 users on Instacart. Each user's dataset includes a variable number of orders, ranging from 4 to 100, showcasing the sequence of products purchased in each order. Additionally, the dataset includes information about the week and time of day when the orders were placed, along with a relative measure of time between consecutive orders.

You can download the dataset from Kaggle.

Entity-Relationship (ER) Model

Data Model

The data model consists of the following tables:

aisles: Contains information about different aisles.
departments: Contains information about different departments.
products: Contains information about different products, including their aisle and department.
orders: Contains information about orders, including user ID, order number, and timing information.
order_products: Represents the relationship between orders and products, including additional information such as add-to-cart order and whether the product was reordered.

CREATE TABLE aisles (
    aisle_id INTEGER PRIMARY KEY,
    aisle VARCHAR(255)
);

CREATE TABLE departments (
    department_id INTEGER PRIMARY KEY,
    department VARCHAR(255)
);

CREATE TABLE products (
    product_id INTEGER PRIMARY KEY,
    aisle_id INTEGER,
    department_id INTEGER,
    product_name VARCHAR(255),
    FOREIGN KEY (aisle_id) REFERENCES aisles (aisle_id),
    FOREIGN KEY (department_id) REFERENCES departments (department_id)
);

CREATE TABLE orders (
    order_id INTEGER PRIMARY KEY,
    user_id INTEGER,
    order_number INTEGER,
    order_dow INTEGER,
    order_hour_of_day INTEGER,
    days_since_prior_order INTEGER
);

CREATE TABLE order_products (
    order_id INTEGER,
    product_id INTEGER,
    add_to_cart_order INTEGER,
    reordered INTEGER,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders (order_id),
    FOREIGN KEY (product_id) REFERENCES products (product_id)
);

Instructions

Clone this repository to your local machine.
Download the dataset from Kaggle and place the CSV files in the data/ folder.
Open and run the data_model.ipynb Jupyter Notebook to create the data model and insert data into PostgreSQL.
Run the SQL queries in practice_questions.sql to perform analysis on the dataset.

Install Packages

pip install pandas
pip install psycopg2
pip install SQLAlchemy

Practice Questions

Create a temporary table that joins the orders, order_products, and products tables to get information about each order, including the products that were purchased and their department and aisle information.
Create a temporary table that groups the orders by product and finds the total number of times each product was purchased, the total number of times each product was reordered, and the average number of times each product was added to a cart.
Create a temporary table that groups the orders by department and finds the total number of products purchased, the total number of unique products purchased, the total number of products purchased on weekdays vs weekends, and the average time of day that products in each department are ordered.
Create a temporary table that groups the orders by aisle and finds the top 10 most popular aisles, including the total number of products purchased and the total number of unique products purchased from each aisle.
Combine the information from the previous temporary tables into a final table that shows the product ID, product name, department ID, department name, aisle ID, aisle name, total number of times purchased, total number of times reordered, average number of times added to cart, total number of products purchased, total number of unique products purchased, total number of products purchased on weekdays, total number of products purchased on weekends, and average time of day products are ordered in each department.

Click here to access my solution to the practice questions

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
data_model.ipynb		data_model.ipynb
er_model.jpg		er_model.jpg
practice_questions.sql		practice_questions.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ecommerce Data Model and Analysis

Dataset

Entity-Relationship (ER) Model

Data Model

Contents

Instructions

Install Packages

Practice Questions

About

Uh oh!

Releases

Packages

Languages

elvarlax/ecommerce-data-model-analysis

Folders and files

Latest commit

History

Repository files navigation

Ecommerce Data Model and Analysis

Dataset

Entity-Relationship (ER) Model

Data Model

Contents

Instructions

Install Packages

Practice Questions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages