Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify validity check + clean up creds #209

Merged
merged 9 commits into from
Aug 13, 2024
Merged

Modify validity check + clean up creds #209

merged 9 commits into from
Aug 13, 2024

Conversation

wendy-aw
Copy link
Contributor

  • Modify translate script to test SQL validity on the exisiting DBs instead of creating empty test DBs. This allows us to flag translated SQL which are invalid due to empty results.
  • Also took the chance to modify 1 question-SQL pair that had empty results, and remove a redundant alternate query.
  • Cleaned up the creds in dialects.py to always refer to db_creds_all in utils.creds

from tqdm import tqdm
from eval.eval import get_all_minimal_queries
import os

tqdm.pandas()

dataset_file = (
"data/instruct_advanced_postgres.csv" # Postgres dataset file to translate
"data/instruct_advanced_postgresTEST.csv" # Postgres dataset file to translate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo spotted!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! i'll revert to the original

Copy link
Member

@rishsriv rishsriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Wendy! Looks good to me – thanks for the big changes in dialects.py in particular, that was a very useful blind spot to look at!

Copy link
Collaborator

@wongjingping wongjingping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just 1 small comment!

@@ -54,7 +54,7 @@ Last 30 days = DATE('now', '-30 days') to DATE('now')
PMSPS = per month salesperson signups
To get the total sales amount per salesperson, join the salespersons and sales tables, group by salesperson, and sum the sale_price. Always order results with NULLS last.
Truncate date to week for aggregation."
car_dealership,sqlite,instructions_cte_join,"WITH recent_sales AS (SELECT sp.id, sp.first_name, sp.last_name, COUNT(s.id) AS num_sales FROM salespersons AS sp LEFT JOIN sales AS s ON sp.id = s.salesperson_id WHERE s.sale_date >= DATE('now', '-30 days') GROUP BY sp.id) SELECT id, first_name, last_name, num_sales FROM recent_sales ORDER BY num_sales DESC;WITH recent_sales AS (SELECT sp.id, sp.first_name, sp.last_name, COUNT(s.id) AS num_sales FROM salespersons AS sp LEFT JOIN sales AS s ON sp.id = s.salesperson_id AND s.sale_date >= DATE('now', '-30 days') GROUP BY sp.id, sp.first_name, sp.last_name) SELECT id, first_name, last_name, num_sales FROM recent_sales ORDER BY num_sales DESC;","How many sales did each salesperson make in the past 30 days, inclusive of today's date? Return their ID, first name, last name and number of sales made, ordered from most to least sales.","To get the number of sales made by each salesperson in the past 30 days, join the salespersons and sales tables and filter for sales in the last 30 days.","When using car makes, model names, engine_type, and vin_number, ensure matching is case-insensitive and allows for partial matches using LIKE with wildcards.
car_dealership,sqlite,instructions_cte_join,"WITH recent_sales AS (SELECT sp.id, sp.first_name, sp.last_name, COUNT(s.id) AS num_sales FROM salespersons AS sp LEFT JOIN sales AS s ON sp.id = s.salesperson_id AND s.sale_date >= DATE('now', '-30 days') GROUP BY sp.id) SELECT id, first_name, last_name, num_sales FROM recent_sales ORDER BY num_sales DESC;","How many sales did each salesperson make in the past 30 days, inclusive of today's date? Return their ID, first name, last name and number of sales made, ordered from most to least sales.","To get the number of sales made by each salesperson in the past 30 days, join the salespersons and sales tables and filter for sales in the last 30 days.","When using car makes, model names, engine_type, and vin_number, ensure matching is case-insensitive and allows for partial matches using LIKE with wildcards.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be GROUP BY sp.id, sp.first_name, sp.last_name) instead of GROUP BY sp.id)? Since we're selecting 3 columns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh sorry i made a mistake. Shouldn't have removed the original because it provided an alternative where zero counts were not included.

On the grouping, I found just grouping by id producing the exact same results so I thought the other grouping cols was redundant. Is there a fundamental difference in meaning that I'm missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering my own question: Ohh it's cos of postgres' flexi rules that it still works with grouping only by id. Should be better practice to add all selected cols in GROUP BY. Lemme revert to original!

@wendy-aw wendy-aw merged commit d00ffdf into main Aug 13, 2024
2 checks passed
@wendy-aw wendy-aw deleted the wendy/test_validity branch August 13, 2024 10:01
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants