Welcome to the MySQL Skill Builder Challenge!
Today’s session is designed to give you a chance to flex some mysql muscle. There are some basic, and some intermediate questions in this challenge. But no matter your level of exposure or experience with MySql, all questions could be answered with a little help from your AI friends.
All of these questions represent the types of queries we might be expected to write in our day-to-day work as developers, or questions we might ask in our roles as non developers.
As part of this process, we encourage you to use tools like Google and AI where appropriate. These technologies are powerful force multipliers in our work, and understanding how and when to leverage them effectively is key.
In helping to track the impact of these tools, we ask for feedback on each question to indicate which tools (if any), you decided to lean on. If you don't need them, don't use them. But treat it as you would for a normal daily task, reaching out where required for the extra assistance.
You may wonder, why focus on MySQL? While MySQL is nothing new, and often abstracted away by ORMs, it remains an essential tool in our broader skill set. Even though many of our staff don’t use it directly every day, fluency in SQL is fundamental across various roles—from QAs to Developers, Platform Engineers, and Product Managers. A solid understanding of SQL can assist in numerous scenarios in your role.
Please work on your own, and if you get completely stuck, feel free to reach out to the facilitator (in #bls-skills-builder on slack) rather than other team members.
Manage your time wisely, you have up to 90 minutes for 11 questions. What you do not complete at the end of the 90 mins you can mark as not done and submit the form. You must submit the form within 90 minutes, but you can continue with the parts you do not get to in your own time.
All answers can be submitted here, a copy of your submission will be sent to you via email on completion: https://forms.gle/HvwyWfwMpK17L4A19
If you finish early, we have a few bonus questions at the end of this doc to push you a little further with some more advanced challenges. These bonus questions can optionally be done in your own time, and you can submit answers to a different form linked below if you would like to.
Instructions for Query Tool:
You may use any SQL client of your choice to connect to the database.
If you don’t already have one installed, you can use options like MySQL Workbench on Windows, or Sequel Ace on Mac. Others suggestions include DBeaver or Tableplus or some IDE's have this functionality built in directly.
If you don't even know what MySQL is, or what a MySql client is, I suggest you start with a ChatGPT conversation to coach you through a basic explanation and some guidelines on getting the client set up. Use this time as a self study session for your own level with a goal of understanding and solving as many questions as possible.
Where neccesary, test your queries in the tool before submitting answers. You'll have read only access so no writes, updates or changes to table structure will be possible. If you encounter issues, troubleshooting and resolving them is part of the challenge!
At the end of this session, you can keep the convo alive in #bls-skills-builder
Let’s dive in and see how far your skills—and the tools at your disposal—can take you.
Below is a database schema for a simple online marketplace. The marketplace allows users to create listings for items they want to sell, and other users can view these listings and express interest by leaving a message (lead) for the seller. The schema also includes user activity logging for auditing purposes.
We've got a bunch of questions for you to answer based on this database schema. Assume its all MySql 8.0.
Below are the credentials you can use to connect and run queries on the database
- Host:
sandbox.cwvzrdsmrbyu.eu-west-1.rds.amazonaws.com
- **Host 2: (if connection limit reached) ** sandbox-2.cwvzrdsmrbyu.eu-west-1.rds.amazonaws.com
- Username:
bls
- Password:
q3G42vcoUSqnfAbTsfIXbsOmcnLHYjXemBcyH9NW
- Database:
marketplace-skill-builder-challenge
- Port:
3306
This data is all faked, generated, made up data, and only serves the purpose of giving you a challenge to work on. Don't look too closely at the data, but rather understand the schema and relationships between the tables.
The challenge includes questions related to:
- Connecting and retrieving basic data from the database.
- Writing SQL queries to answer "real world" questions about the database marketplace data.
- Debugging and Fixing database queries.
Stores all user information.
Column | Type | Description |
---|---|---|
user_id |
BIGINT UNSIGNED |
Primary Key. Unique identifier for each user. |
username |
VARCHAR(50) |
The username of the user. |
email |
VARCHAR(100) |
Unique email address of the user. |
password_hash |
VARCHAR(255) |
Hashed password for security. |
year_of_birth |
VARCHAR(255) |
Birth Year to calculate their age. |
created_at |
TIMESTAMP |
Timestamp when the user was created. |
updated_at |
TIMESTAMP |
Timestamp when the user was last updated. |
- Indexes:
PRIMARY KEY (user_id)
UNIQUE (email)
to ensure email uniqueness.INDEX (year_of_birth)
Contains all marketplace listings.
Column | Type | Description |
---|---|---|
listing_id |
BIGINT UNSIGNED |
Primary Key. Unique identifier for each listing. |
user_id |
BIGINT UNSIGNED |
Foreign Key to users . The user who created the listing. |
category_id |
BIGINT UNSIGNED |
Foreign Key to categories . The category for the listing. |
title |
VARCHAR(255) |
Title of the listing. |
description |
TEXT |
Detailed description of the listing. |
price |
DECIMAL(10, 2) |
Price of the listing. |
status |
ENUM('active', 'sold', 'expired') |
Status of the listing (active, sold, expired). |
created_at |
TIMESTAMP |
Timestamp when the listing was created. |
updated_at |
TIMESTAMP |
Timestamp when the listing was last updated. |
- Indexes:
PRIMARY KEY (listing_id)
INDEX (user_id)
for efficient user-listing joins.INDEX (category_id)
for categorization queries.
Tracks all leads generated for listings.
Column | Type | Description |
---|---|---|
lead_id |
BIGINT UNSIGNED |
Primary Key. Unique identifier for each lead. |
listing_id |
BIGINT UNSIGNED |
Foreign Key to listings . The listing the lead is associated with. |
user_id |
BIGINT UNSIGNED |
Foreign Key to users . The user who created the lead. Nullable for guest users. |
guest_name |
VARCHAR(100) |
Name of the guest user (nullable if a registered user creates the lead). |
guest_email |
VARCHAR(100) |
Email of the guest user (nullable if a registered user creates the lead). |
message |
TEXT |
Message left by the lead. |
created_at |
TIMESTAMP |
Timestamp when the lead was created. |
- Indexes:
PRIMARY KEY (lead_id)
INDEX (listing_id)
for finding leads by listing.INDEX (user_id)
for querying leads by user.INDEX (listing_id, lead_id)
to optimize performance when querying for leads.
Manages categories and subcategories for listings.
Column | Type | Description |
---|---|---|
category_id |
BIGINT UNSIGNED |
Primary Key. Unique identifier for each category. |
name |
VARCHAR(255) |
Name of the category. |
parent_category_id |
BIGINT UNSIGNED |
Foreign Key to categories . Allows subcategories. Nullable for root categories. |
created_at |
TIMESTAMP |
Timestamp when the category was created. |
updated_at |
TIMESTAMP |
Timestamp when the category was last updated. |
- Indexes:
PRIMARY KEY (category_id)
INDEX (parent_category_id)
for hierarchical queries.
Logs user activity for auditing purposes.
Column | Type | Description |
---|---|---|
log_id |
BIGINT UNSIGNED |
Primary Key. Unique identifier for each log entry. |
user_id |
BIGINT UNSIGNED |
Foreign Key to users . The user performing the action. |
activity_type |
VARCHAR(255) |
Describes the type of activity (e.g., 'login', 'listing_view'). |
activity_details |
JSON |
JSON field containing additional details about the activity (e.g., listing viewed). |
timestamp |
TIMESTAMP |
Timestamp when the activity was logged. |
- Indexes:
PRIMARY KEY (log_id)
INDEX (user_id)
for querying activities by user.INDEX (user_id, timestamp)
for performance optimisation in activity queries.
Question: Connect to the database and write a query to retrieve the most recent listing_id
and title
from the listings
table. Order the result by listing_id
in descending order and return the top result.
Question: Write a query to retrieve all listings (ID and Title) that are currently active (status = 'active'
) and include the username and email of the user who created the listing.
Question: Write a query to return the listing_id
, title
, and the total number of leads for each listing. The result should include all listings, even those that have no leads.
Question: Write a query to retrieve the most recent 10 activities of a user, including the activity_type
, timestamp
and listing_id
from the activity_details
column. Assume the user_id
is 100
Question: Write a query to return all listings in the category "3" (including any subcategories under 3). Assume an unknown depth category tree and include all category children in your query.
Question: The below query misses the indexes defined and needs to be optimised:
SELECT user_id, username FROM users WHERE year_of_birth = 1956
Describe how you would investigate and prove that this misses the index. Describe the change your would make to ensure that it hits the index.
Question: The following query is supposed to return all active listings (status = 'active'
) along with the total number of leads for each listing. However, the query is not counting leads correctly for listings that have no leads. Identify and fix the issue, and ensure the results are ordered by the number of leads in descending order.
SELECT listings.listing_id, listings.title, COUNT(leads.lead_id) AS total_leads
FROM listings
JOIN leads ON listings.listing_id = leads.listing_id
WHERE listings.status = 'active'
GROUP BY listings.listing_id, listings.title
ORDER BY total_leads DESC;
SKIP QueSTION 8
Question: Write a query that returns the total number of leads submitted by guest users (those with NULL
in user_id
) and registered users for each listing. Additionally, the query should return the name of the user who submitted each lead, and for guest users, it should display "Guest"
instead of NULL
. However, the query is not working as expected. Identify and fix the issue.
Question: Write a query to calculate the average time (in hours) between the creation of a listing and the submission of its first lead. The result should be grouped by listing.
The following query retrieves detailed information for users who have at least one active listing, including the total number of leads for those listings, the average price of their active listings, and the most recent activity from the users. Lets assume that no indexes exist on this table yet. The query will be slow and inefficient because no indexes are defined.
Analyze the query, identify all missing indexes, and write the SQL commands to create the necessary indexes that will significantly improve performance.
SELECT
users.user_id,
users.username,
COUNT(DISTINCT leads.lead_id) AS total_leads,
AVG(listings.price) AS avg_price,
MAX(user_activity_log.timestamp) AS last_user_activity
FROM users
JOIN listings ON users.user_id = listings.user_id
JOIN leads ON listings.listing_id = leads.listing_id
LEFT JOIN user_activity_log ON users.user_id = user_activity_log.user_id
WHERE listings.status = 'active'
GROUP BY users.user_id, users.username;
You’ve been alerted that there is a long-running query that is impacting the performance of the database. Your task is to first locate the query and then terminate it immediately.
Provide the SQL commands that will be used to locate the long-running query and to terminate it.
For reference, although not required, here is the SQL to create the tables in the database:
-- Create the `users` table
CREATE TABLE users (
user_id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(50) NOT NULL,
email VARCHAR(100) NOT NULL UNIQUE,
year_of_birth VARCHAR(255) NOT NULL,
password_hash VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
-- Create the `categories` table
CREATE TABLE categories (
category_id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
parent_category_id BIGINT UNSIGNED NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
CONSTRAINT fk_parent_category FOREIGN KEY (parent_category_id) REFERENCES categories (category_id) ON DELETE SET NULL
);
-- Create the `listings` table
CREATE TABLE listings (
listing_id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
user_id BIGINT UNSIGNED NOT NULL,
category_id BIGINT UNSIGNED NULL,
title VARCHAR(255) NOT NULL,
description TEXT,
price DECIMAL (10,
2) NOT NULL,
status ENUM ('active',
'sold',
'expired') DEFAULT 'active',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
CONSTRAINT fk_user_listing FOREIGN KEY (user_id) REFERENCES users (user_id) ON DELETE CASCADE,
CONSTRAINT fk_category_listing FOREIGN KEY (category_id) REFERENCES categories (category_id) ON DELETE SET NULL
);
-- Create the `leads` table
CREATE TABLE leads (
lead_id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
listing_id BIGINT UNSIGNED NOT NULL,
user_id BIGINT UNSIGNED NULL,
guest_name VARCHAR(100) NULL,
guest_email VARCHAR(100) NULL,
message TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT fk_listing_lead FOREIGN KEY (listing_id) REFERENCES listings (listing_id) ON DELETE CASCADE,
CONSTRAINT fk_user_lead FOREIGN KEY (user_id) REFERENCES users (user_id) ON DELETE SET NULL
);
-- Create the `user_activity_log` table
CREATE TABLE user_activity_log (
log_id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
user_id BIGINT UNSIGNED NOT NULL,
activity_type VARCHAR(255) NOT NULL,
activity_details JSON,
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT fk_user_activity FOREIGN KEY (user_id) REFERENCES users (user_id) ON DELETE CASCADE
);
-- Indexes for performance optimisation
CREATE INDEX idx_user_email ON users (email);
CREATE INDEX idx_user_year_of_birth ON users (year_of_birth);
CREATE INDEX idx_listing_user ON listings (user_id);
CREATE INDEX idx_listing_category ON listings (category_id);
CREATE INDEX idx_lead_listing ON leads (listing_id);
CREATE INDEX idx_lead_user ON leads (user_id);
CREATE INDEX idx_category_parent ON categories (parent_category_id);
CREATE INDEX idx_activity_user ON user_activity_log (user_id);
CREATE INDEX idx_listing_lead ON leads (listing_id, lead_id);
CREATE INDEX idx_user_activity_timestamp ON user_activity_log (user_id, timestamp);
Bonus Questions: (Answers can be submitted to a new form here anytime during or after the challenge. No time limit)
Bonus Question 1: Identify Users with Listings that Have a Lower Conversion Rate Than the 75th Percentile
Question: Write a query to identify users whose listings have a conversion rate (leads per listing) below the 75th percentile of conversion rates across all listings. The result should include the user_id
, username
, total listings, total leads, and the user's conversion rate.
Use the NTILE function to assist with the calculations.
Question: Write a query that returns all listings (title, and listing ID) which do not have any leads.
Question: Find Users Who Logged in on Two Consecutive Days. Return their name and ID.
Question: Find the Time Difference Between User Registration and Their First Lead. Return the User ID, Name and the time difference for each user.
Question: More recent MySQL version have features that could improve querying. Look at the below queries and rewrite each one to be more modern and efficient.
A) Find Categories with More Listings Than the Average
SELECT c.*
FROM categories c
WHERE (SELECT COUNT(*) FROM listings l WHERE l.category_id = c.id) > (
SELECT AVG(listing_count)
FROM (SELECT COUNT(*) AS listing_count FROM listings GROUP BY category_id) AS avg_table
);
B) Find Listings with the Most Leads in Each Category
SELECT l.id, l.title, l.category_id
FROM listings l
WHERE l.id IN (
SELECT lead.listing_id
FROM leads lead
GROUP BY lead.listing_id
HAVING COUNT(lead.id) = (
SELECT MAX(lead_count)
FROM (SELECT listing_id, COUNT(*) AS lead_count FROM leads GROUP BY listing_id) AS lead_counts
WHERE lead_counts.listing_id = l.id
)
);
C) Find Users Who Logged in on Two Consecutive Days
SELECT u1.user_id, u1.activity_time
FROM user_activities_log u1
JOIN user_activities_log u2 ON u1.user_id = u2.user_id
WHERE DATE(u2.activity_time) = DATE(u1.activity_time) + INTERVAL 1 DAY