Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify validity check + clean up creds #209

Merged
merged 9 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion data/instruct_advanced_sqlite.csv
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Last 30 days = DATE('now', '-30 days') to DATE('now')
PMSPS = per month salesperson signups
To get the total sales amount per salesperson, join the salespersons and sales tables, group by salesperson, and sum the sale_price. Always order results with NULLS last.
Truncate date to week for aggregation."
car_dealership,sqlite,instructions_cte_join,"WITH recent_sales AS (SELECT sp.id, sp.first_name, sp.last_name, COUNT(s.id) AS num_sales FROM salespersons AS sp LEFT JOIN sales AS s ON sp.id = s.salesperson_id WHERE s.sale_date >= DATE('now', '-30 days') GROUP BY sp.id) SELECT id, first_name, last_name, num_sales FROM recent_sales ORDER BY num_sales DESC;WITH recent_sales AS (SELECT sp.id, sp.first_name, sp.last_name, COUNT(s.id) AS num_sales FROM salespersons AS sp LEFT JOIN sales AS s ON sp.id = s.salesperson_id AND s.sale_date >= DATE('now', '-30 days') GROUP BY sp.id, sp.first_name, sp.last_name) SELECT id, first_name, last_name, num_sales FROM recent_sales ORDER BY num_sales DESC;","How many sales did each salesperson make in the past 30 days, inclusive of today's date? Return their ID, first name, last name and number of sales made, ordered from most to least sales.","To get the number of sales made by each salesperson in the past 30 days, join the salespersons and sales tables and filter for sales in the last 30 days.","When using car makes, model names, engine_type, and vin_number, ensure matching is case-insensitive and allows for partial matches using LIKE with wildcards.
car_dealership,sqlite,instructions_cte_join,"WITH recent_sales AS (SELECT sp.id, sp.first_name, sp.last_name, COUNT(s.id) AS num_sales FROM salespersons AS sp LEFT JOIN sales AS s ON sp.id = s.salesperson_id AND s.sale_date >= DATE('now', '-30 days') GROUP BY sp.id) SELECT id, first_name, last_name, num_sales FROM recent_sales ORDER BY num_sales DESC;","How many sales did each salesperson make in the past 30 days, inclusive of today's date? Return their ID, first name, last name and number of sales made, ordered from most to least sales.","To get the number of sales made by each salesperson in the past 30 days, join the salespersons and sales tables and filter for sales in the last 30 days.","When using car makes, model names, engine_type, and vin_number, ensure matching is case-insensitive and allows for partial matches using LIKE with wildcards.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be GROUP BY sp.id, sp.first_name, sp.last_name) instead of GROUP BY sp.id)? Since we're selecting 3 columns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh sorry i made a mistake. Shouldn't have removed the original because it provided an alternative where zero counts were not included.

On the grouping, I found just grouping by id producing the exact same results so I thought the other grouping cols was redundant. Is there a fundamental difference in meaning that I'm missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering my own question: Ohh it's cos of postgres' flexi rules that it still works with grouping only by id. Should be better practice to add all selected cols in GROUP BY. Lemme revert to original!

To get the number of sales made by each salesperson in the past 30 days, join the salespersons and sales tables and filter for sales in the last 30 days.
ASP = Calculate the average sale price without specifying the period
GPM = Define gross profit margin as a ratio without specifying how to calculate total revenue or total cost"
Expand Down
2 changes: 1 addition & 1 deletion data/instruct_basic_bigquery.csv
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ derm_treatment,bigquery,basic_group_order_limit,"SELECT ins_type, AVG(height_cm)
derm_treatment,bigquery,basic_group_order_limit,"SELECT specialty, COUNT(*) AS num_doctors FROM derm_treatment.doctors GROUP BY specialty ORDER BY num_doctors DESC NULLS FIRST LIMIT 2;",What are the top 2 specialties by number of doctors? Return the specialty and number of doctors.
derm_treatment,bigquery,basic_left_join,"SELECT p.patient_id, p.first_name, p.last_name FROM derm_treatment.patients AS p LEFT JOIN derm_treatment.treatments AS t ON p.patient_id = t.patient_id WHERE t.patient_id IS NULL;","Return the patient IDs, first names and last names of patients who have not received any treatments."
derm_treatment,bigquery,basic_left_join,"SELECT d.drug_id, d.drug_name FROM derm_treatment.drugs AS d LEFT JOIN derm_treatment.treatments AS t ON d.drug_id = t.drug_id WHERE t.drug_id IS NULL;",Return the drug IDs and names of drugs that have not been used in any treatments.
ewallet,bigquery,basic_join_date_group_order_limit,"SELECT m.name AS merchant_name, COUNT(t.txid) AS total_transactions, SUM(t.amount) AS total_amount FROM ewallet.merchants AS m JOIN ewallet.wallet_transactions_daily AS t ON m.mid = t.receiver_id WHERE t.receiver_type = 1 AND t.created_at >= CURRENT_DATE - INTERVAL '30' DAY GROUP BY merchant_name ORDER BY total_amount DESC NULLS FIRST LIMIT 5;","Who are the top 5 merchants (receiver type 1) by total transaction amount in the past 30 days (inclusive of 30 days ago)? Return the merchant name, total number of transactions, and total transaction amount."
ewallet,bigquery,basic_join_date_group_order_limit,"SELECT m.name AS merchant_name, COUNT(t.txid) AS total_transactions, SUM(t.amount) AS total_amount FROM ewallet.merchants AS m JOIN ewallet.wallet_transactions_daily AS t ON m.mid = t.receiver_id WHERE t.receiver_type = 1 AND t.created_at >= CURRENT_DATE - INTERVAL '150' DAY GROUP BY merchant_name ORDER BY total_amount DESC NULLS FIRST LIMIT 2;","Who are the top 2 merchants (receiver type 1) by total transaction amount in the past 150 days (inclusive of 150 days ago)? Return the merchant name, total number of transactions, and total transaction amount."
ewallet,bigquery,basic_join_date_group_order_limit,"SELECT TIMESTAMP_TRUNC(t.created_at, MONTH) AS MONTH, COUNT(DISTINCT t.sender_id) AS active_users FROM ewallet.wallet_transactions_daily AS t JOIN ewallet.users AS u ON t.sender_id = u.uid WHERE t.sender_type = 0 AND t.status = 'success' AND u.status = 'active' AND t.created_at >= '2023-01-01' AND t.created_at < '2024-01-01' GROUP BY MONTH ORDER BY MONTH NULLS LAST;","How many distinct active users sent money per month in 2023? Return the number of active users per month (as a date), starting from the earliest date. Do not include merchants in the query. Only include successful transactions."
ewallet,bigquery,basic_join_group_order_limit,"SELECT c.code AS coupon_code, COUNT(t.txid) AS redemption_count, SUM(t.amount) AS total_discount FROM ewallet.coupons AS c JOIN ewallet.wallet_transactions_daily AS t ON c.cid = t.coupon_id GROUP BY coupon_code ORDER BY redemption_count DESC NULLS FIRST LIMIT 3;","What are the top 3 most frequently used coupon codes? Return the coupon code, total number of redemptions, and total amount redeemed."
ewallet,bigquery,basic_join_group_order_limit,"SELECT u.country, COUNT(DISTINCT t.sender_id) AS user_count, SUM(t.amount) AS total_amount FROM ewallet.users AS u JOIN ewallet.wallet_transactions_daily AS t ON u.uid = t.sender_id WHERE t.sender_type = 0 GROUP BY u.country ORDER BY total_amount DESC NULLS FIRST LIMIT 5;","Which are the top 5 countries by total transaction amount sent by users, sender_type = 0? Return the country, number of distinct users who sent, and total transaction amount."
Expand Down
2 changes: 1 addition & 1 deletion data/instruct_basic_mysql.csv
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ derm_treatment,mysql,basic_group_order_limit,"SELECT ins_type, AVG(height_cm) AS
derm_treatment,mysql,basic_group_order_limit,"SELECT specialty, COUNT(*) AS num_doctors FROM doctors GROUP BY specialty ORDER BY CASE WHEN num_doctors IS NULL THEN 1 ELSE 0 END DESC, num_doctors DESC LIMIT 2;",What are the top 2 specialties by number of doctors? Return the specialty and number of doctors.
derm_treatment,mysql,basic_left_join,"SELECT p.patient_id, p.first_name, p.last_name FROM patients AS p LEFT JOIN treatments AS t ON p.patient_id = t.patient_id WHERE t.patient_id IS NULL;","Return the patient IDs, first names and last names of patients who have not received any treatments."
derm_treatment,mysql,basic_left_join,"SELECT d.drug_id, d.drug_name FROM drugs AS d LEFT JOIN treatments AS t ON d.drug_id = t.drug_id WHERE t.drug_id IS NULL;",Return the drug IDs and names of drugs that have not been used in any treatments.
ewallet,mysql,basic_join_date_group_order_limit,"SELECT m.name AS merchant_name, COUNT(t.txid) AS total_transactions, SUM(t.amount) AS total_amount FROM ewallet.merchants AS m JOIN ewallet.wallet_transactions_daily AS t ON m.mid = t.receiver_id WHERE t.receiver_type = 1 AND t.created_at >= CURRENT_DATE - INTERVAL 30 DAY GROUP BY m.name ORDER BY total_amount DESC LIMIT 5;","Who are the top 5 merchants (receiver type 1) by total transaction amount in the past 30 days (inclusive of 30 days ago)? Return the merchant name, total number of transactions, and total transaction amount."
ewallet,mysql,basic_join_date_group_order_limit,"SELECT m.name AS merchant_name, COUNT(t.txid) AS total_transactions, SUM(t.amount) AS total_amount FROM ewallet.merchants AS m JOIN ewallet.wallet_transactions_daily AS t ON m.mid = t.receiver_id WHERE t.receiver_type = 1 AND t.created_at >= CURRENT_DATE - INTERVAL 150 DAY GROUP BY m.name ORDER BY total_amount DESC LIMIT 2;","Who are the top 2 merchants (receiver type 1) by total transaction amount in the past 150 days (inclusive of 150 days ago)? Return the merchant name, total number of transactions, and total transaction amount."
ewallet,mysql,basic_join_date_group_order_limit,"SELECT DATE_FORMAT(t.created_at, '%Y-%m-01') AS month, COUNT(DISTINCT t.sender_id) AS active_users FROM ewallet.wallet_transactions_daily AS t JOIN ewallet.users AS u ON t.sender_id = u.uid WHERE t.sender_type = 0 AND t.status = 'success' AND u.status = 'active' AND t.created_at >= '2023-01-01' AND t.created_at < '2024-01-01' GROUP BY month ORDER BY month;","How many distinct active users sent money per month in 2023? Return the number of active users per month (as a date), starting from the earliest date. Do not include merchants in the query. Only include successful transactions."
ewallet,mysql,basic_join_group_order_limit,"SELECT c.code AS coupon_code, COUNT(t.txid) AS redemption_count, SUM(t.amount) AS total_discount FROM coupons AS c JOIN wallet_transactions_daily AS t ON c.cid = t.coupon_id GROUP BY c.code ORDER BY CASE WHEN redemption_count IS NULL THEN 1 ELSE 0 END DESC, redemption_count DESC LIMIT 3;","What are the top 3 most frequently used coupon codes? Return the coupon code, total number of redemptions, and total amount redeemed."
ewallet,mysql,basic_join_group_order_limit,"SELECT u.country, COUNT(DISTINCT t.sender_id) AS user_count, SUM(t.amount) AS total_amount FROM users AS u JOIN wallet_transactions_daily AS t ON u.uid = t.sender_id WHERE t.sender_type = 0 GROUP BY u.country ORDER BY CASE WHEN total_amount IS NULL THEN 1 ELSE 0 END DESC, total_amount DESC LIMIT 5;","Which are the top 5 countries by total transaction amount sent by users, sender_type = 0? Return the country, number of distinct users who sent, and total transaction amount."
Expand Down
2 changes: 1 addition & 1 deletion data/instruct_basic_postgres.csv
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ derm_treatment,basic_group_order_limit,"What are the top 3 insurance types by av
derm_treatment,basic_group_order_limit,What are the top 2 specialties by number of doctors? Return the specialty and number of doctors.,"SELECT specialty, COUNT(*) AS num_doctors FROM doctors GROUP BY specialty ORDER BY num_doctors DESC LIMIT 2"
derm_treatment,basic_left_join,"Return the patient IDs, first names and last names of patients who have not received any treatments.","SELECT p.patient_id, p.first_name, p.last_name FROM patients p LEFT JOIN treatments t ON p.patient_id = t.patient_id WHERE t.patient_id IS NULL"
derm_treatment,basic_left_join,Return the drug IDs and names of drugs that have not been used in any treatments.,"SELECT d.drug_id, d.drug_name FROM drugs d LEFT JOIN treatments t ON d.drug_id = t.drug_id WHERE t.drug_id IS NULL"
ewallet,basic_join_date_group_order_limit,"Who are the top 5 merchants (receiver type 1) by total transaction amount in the past 30 days (inclusive of 30 days ago)? Return the merchant name, total number of transactions, and total transaction amount.","SELECT m.name AS merchant_name, COUNT(t.txid) AS total_transactions, SUM(t.amount) AS total_amount FROM consumer_div.merchants m JOIN consumer_div.wallet_transactions_daily t ON m.mid = t.receiver_id WHERE t.receiver_type = 1 AND t.created_at >= CURRENT_DATE - INTERVAL '30 days' GROUP BY m.name ORDER BY total_amount DESC LIMIT 5"
ewallet,basic_join_date_group_order_limit,"Who are the top 2 merchants (receiver type 1) by total transaction amount in the past 150 days (inclusive of 150 days ago)? Return the merchant name, total number of transactions, and total transaction amount.","SELECT m.name AS merchant_name, COUNT(t.txid) AS total_transactions, SUM(t.amount) AS total_amount FROM consumer_div.merchants m JOIN consumer_div.wallet_transactions_daily t ON m.mid = t.receiver_id WHERE t.receiver_type = 1 AND t.created_at >= CURRENT_DATE - INTERVAL '150 days' GROUP BY m.name ORDER BY total_amount DESC LIMIT 2"
ewallet,basic_join_date_group_order_limit,"How many distinct active users sent money per month in 2023? Return the number of active users per month (as a date), starting from the earliest date. Do not include merchants in the query. Only include successful transactions.","SELECT DATE_TRUNC('month', t.created_at) AS MONTH, COUNT(DISTINCT t.sender_id) AS active_users FROM consumer_div.wallet_transactions_daily t JOIN consumer_div.users u ON t.sender_id = u.uid WHERE t.sender_type = 0 AND t.status = 'success' AND u.status = 'active' AND t.created_at >= '2023-01-01' AND t.created_at < '2024-01-01' GROUP BY MONTH ORDER BY MONTH"
ewallet,basic_join_group_order_limit,"What are the top 3 most frequently used coupon codes? Return the coupon code, total number of redemptions, and total amount redeemed.","SELECT c.code AS coupon_code, COUNT(t.txid) AS redemption_count, SUM(t.amount) AS total_discount FROM consumer_div.coupons c JOIN consumer_div.wallet_transactions_daily t ON c.cid = t.coupon_id GROUP BY c.code ORDER BY redemption_count DESC LIMIT 3"
ewallet,basic_join_group_order_limit,"Which are the top 5 countries by total transaction amount sent by users, sender_type = 0? Return the country, number of distinct users who sent, and total transaction amount.","SELECT u.country, COUNT(DISTINCT t.sender_id) AS user_count, SUM(t.amount) AS total_amount FROM consumer_div.users u JOIN consumer_div.wallet_transactions_daily t ON u.uid = t.sender_id WHERE t.sender_type = 0 GROUP BY u.country ORDER BY total_amount DESC LIMIT 5"
Expand Down
2 changes: 1 addition & 1 deletion data/instruct_basic_sqlite.csv
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ derm_treatment,sqlite,basic_group_order_limit,"SELECT ins_type, AVG(height_cm) A
derm_treatment,sqlite,basic_group_order_limit,"SELECT specialty, COUNT(*) AS num_doctors FROM doctors GROUP BY specialty ORDER BY CASE WHEN num_doctors IS NULL THEN 1 ELSE 0 END DESC, num_doctors DESC LIMIT 2;",What are the top 2 specialties by number of doctors? Return the specialty and number of doctors.
derm_treatment,sqlite,basic_left_join,"SELECT p.patient_id, p.first_name, p.last_name FROM patients AS p LEFT JOIN treatments AS t ON p.patient_id = t.patient_id WHERE t.patient_id IS NULL;","Return the patient IDs, first names and last names of patients who have not received any treatments."
derm_treatment,sqlite,basic_left_join,"SELECT d.drug_id, d.drug_name FROM drugs AS d LEFT JOIN treatments AS t ON d.drug_id = t.drug_id WHERE t.drug_id IS NULL;",Return the drug IDs and names of drugs that have not been used in any treatments.
ewallet,sqlite,basic_join_date_group_order_limit,"SELECT m.name AS merchant_name, COUNT(t.txid) AS total_transactions, SUM(t.amount) AS total_amount FROM merchants AS m JOIN wallet_transactions_daily AS t ON m.mid = t.receiver_id WHERE t.receiver_type = 1 AND t.created_at >= DATE('now', '-30 days') GROUP BY m.name ORDER BY total_amount DESC LIMIT 5;","Who are the top 5 merchants (receiver type 1) by total transaction amount in the past 30 days (inclusive of 30 days ago)? Return the merchant name, total number of transactions, and total transaction amount."
ewallet,sqlite,basic_join_date_group_order_limit,"SELECT m.name AS merchant_name, COUNT(t.txid) AS total_transactions, SUM(t.amount) AS total_amount FROM merchants AS m JOIN wallet_transactions_daily AS t ON m.mid = t.receiver_id WHERE t.receiver_type = 1 AND t.created_at >= DATE('now', '-150 days') GROUP BY m.name ORDER BY total_amount DESC LIMIT 2;","Who are the top 2 merchants (receiver type 1) by total transaction amount in the past 150 days (inclusive of 150 days ago)? Return the merchant name, total number of transactions, and total transaction amount."
ewallet,sqlite,basic_join_date_group_order_limit,"SELECT strftime('%Y-%m', t.created_at) AS month, COUNT(DISTINCT t.sender_id) AS active_users FROM wallet_transactions_daily AS t JOIN users AS u ON t.sender_id = u.uid WHERE t.sender_type = 0 AND t.status = 'success' AND u.status = 'active' AND t.created_at >= '2023-01-01' AND t.created_at < '2024-01-01' GROUP BY month ORDER BY month;","How many distinct active users sent money per month in 2023? Return the number of active users per month (as a date), starting from the earliest date. Do not include merchants in the query. Only include successful transactions."
ewallet,sqlite,basic_join_group_order_limit,"SELECT c.code AS coupon_code, COUNT(t.txid) AS redemption_count, SUM(t.amount) AS total_discount FROM coupons AS c JOIN wallet_transactions_daily AS t ON c.cid = t.coupon_id GROUP BY c.code ORDER BY CASE WHEN redemption_count IS NULL THEN 1 ELSE 0 END DESC, redemption_count DESC LIMIT 3;","What are the top 3 most frequently used coupon codes? Return the coupon code, total number of redemptions, and total amount redeemed."
ewallet,sqlite,basic_join_group_order_limit,"SELECT u.country, COUNT(DISTINCT t.sender_id) AS user_count, SUM(t.amount) AS total_amount FROM users AS u JOIN wallet_transactions_daily AS t ON u.uid = t.sender_id WHERE t.sender_type = 0 GROUP BY u.country ORDER BY CASE WHEN total_amount IS NULL THEN 1 ELSE 0 END DESC, total_amount DESC LIMIT 5;","Which are the top 5 countries by total transaction amount sent by users, sender_type = 0? Return the country, number of distinct users who sent, and total transaction amount."
Expand Down
Loading
Loading