Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the Exec Acc in your paper #7

Open
BeachWang opened this issue May 17, 2023 · 16 comments
Open

About the Exec Acc in your paper #7

BeachWang opened this issue May 17, 2023 · 16 comments

Comments

@BeachWang
Copy link

I find that Liu show the Exec Acc is 70.1 in their (Liu et al., 2023a), but there is 60.1 in your paper. Is it a mistake here? Do you have used the same evaluation codes in Exec?

@BeachWang
Copy link
Author

Besides, I am confuse about that DIN-SQL have similar Exact match Accs in Table 2 and Table 3 but two significant differences Exec Accs.

@MohammadrezaPourreza
Copy link
Owner

Thank you so much for pointing these out. First, the exec acc method we are using to evaluate our model is the official metric published here: https://github.com/taoyds/test-suite-sql-eval. This metric which is called "Exec acc" is actually computing the test suite accuracy as also stated in the repo "This repo contains test suite evaluation metric ". Thus we compared our method with the work of (Liu et al., 2023a) in terms of test-suite accuracy and their reported test-suite accuracy is 60.1.
Second, table 2 contains the results of our method on the test set of spider and table 3 has the results on the dev set of spider.

@BeachWang
Copy link
Author

I use the official metric to eval the result on dev set you publish in the GPT4_results file and the results of Exec acc are 85.1 for DIN-SQL and 80.1 for few-shot. Maybe you have used the different metric in Table 2 and Table 3 I guess?

@MohammadrezaPourreza
Copy link
Owner

That's interesting, maybe there is a problem with the script we are using. thank you so much for letting us know.

@amity871028
Copy link

amity871028 commented Jun 1, 2023

I also got a different score about GPT4_results.
@MohammadrezaPourreza Could I know what your script is?
I use https://github.com/taoyds/test-suite-sql-eval and follow it's steps.
I run a script like that:
python3 evaluation.py --gold ./my_test/gold_example.txt --pred ./my_test/din_sql_pred_sql.txt --db ./database/--etype exec --plug_value
I got these:
image
if I run a script like that:
python3 evaluation.py --gold ./my_test/gold_example.txt --pred ./my_test/din_sql_pred_sql.txt --db ./database/--etype exec
and, I got these:
image
Both of 0.863 and 0.828 are different from your paper's result.
I'm so curious which part I run wrongly.
Thanks!

@MohammadrezaPourreza
Copy link
Owner

It's interesting for me as well, many people told me they got different results on the dev set, and even among those the results were not consistent. We are trying to figure out where is the problem.

@shuaichenchang
Copy link

Thank you for your great work @MohammadrezaPourreza. I got the same number of 82.8 as @amity871028. I am using the EX accuracy obtained by https://github.com/taoyds/test-suite-sql-eval. I think this is also used as the evaluation script for Spider-test. I am guessing that you were using the https://github.com/taoyds/spider/blob/master/evaluation.py for EX accuracy which always generates a number that is a bit lower than that from the test-suite. Not sure if I am right so ignore it if my guess is wrong.

@amity871028
Copy link

It's interesting for me as well, many people told me they got different results on the dev set, and even among those the results were not consistent. We are trying to figure out where is the problem.

Thank you for your replying! I will wait for your result.

@ShiXiangXiang123
Copy link

It's interesting for me as well, many people told me they got different results on the dev set, and even among those the results were not consistent. We are trying to figure out where is the problem.

Thank you for your replying! I will wait for your result.

哥们帮我看看我的问题可以吗?
image
运行后一直这样,不能输入

@linxin6
Copy link

linxin6 commented Jun 9, 2023

大概是网络问题

@ShiXiangXiang123
Copy link

大概是网络问题

我链接了梯子,还是不行。为什么呢

@linxin6
Copy link

linxin6 commented Jun 9, 2023

全局开了吗?或者你可以试试用可以国内转发的代理

@ShiXiangXiang123
Copy link

全局开了吗?或者你可以试试用可以国内转发的代理

开的全局

@ShiXiangXiang123
Copy link

Thank you for your great work @MohammadrezaPourreza. I got the same number of 82.8 as @amity871028. I am using the EX accuracy obtained by https://github.com/taoyds/test-suite-sql-eval. I think this is also used as the evaluation script for Spider-test. I am guessing that you were using the https://github.com/taoyds/spider/blob/master/evaluation.py for EX accuracy which always generates a number that is a bit lower than that from the test-suite. Not sure if I am right so ignore it if my guess is wrong.

哥,可以加个微信帮我看看问题吗?15523313206 感激不尽

@BeachWang
Copy link
Author

BeachWang commented Jun 14, 2023 via email

@arian-askari
Copy link

arian-askari commented May 16, 2024

@MohammadrezaPourreza I also got different results when evaluated the https://github.com/MohammadrezaPourreza/Few-shot-NL2SQL-with-prompting/blob/main/GPT4_results/DIN-SQL.csv! Is there any update on this issue?
image

This is how I formatted the files for evaluation:

din_sql_gold_evalformat.csv
din_sql_prediction_evalformat.csv

My command:

test-suite-sql-eval-master\evaluation.py --gold din_sql_gold_evalformat.csv  --pred din_sql_prediction_evalformat.csv --etype exec --db .\database

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants