-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the Exec Acc in your paper #7
Comments
Besides, I am confuse about that DIN-SQL have similar Exact match Accs in Table 2 and Table 3 but two significant differences Exec Accs. |
Thank you so much for pointing these out. First, the exec acc method we are using to evaluate our model is the official metric published here: https://github.com/taoyds/test-suite-sql-eval. This metric which is called "Exec acc" is actually computing the test suite accuracy as also stated in the repo "This repo contains test suite evaluation metric ". Thus we compared our method with the work of (Liu et al., 2023a) in terms of test-suite accuracy and their reported test-suite accuracy is 60.1. |
I use the official metric to eval the result on dev set you publish in the GPT4_results file and the results of Exec acc are 85.1 for DIN-SQL and 80.1 for few-shot. Maybe you have used the different metric in Table 2 and Table 3 I guess? |
That's interesting, maybe there is a problem with the script we are using. thank you so much for letting us know. |
I also got a different score about GPT4_results. |
It's interesting for me as well, many people told me they got different results on the dev set, and even among those the results were not consistent. We are trying to figure out where is the problem. |
Thank you for your great work @MohammadrezaPourreza. I got the same number of 82.8 as @amity871028. I am using the EX accuracy obtained by https://github.com/taoyds/test-suite-sql-eval. I think this is also used as the evaluation script for Spider-test. I am guessing that you were using the https://github.com/taoyds/spider/blob/master/evaluation.py for EX accuracy which always generates a number that is a bit lower than that from the test-suite. Not sure if I am right so ignore it if my guess is wrong. |
Thank you for your replying! I will wait for your result. |
大概是网络问题 |
我链接了梯子,还是不行。为什么呢 |
全局开了吗?或者你可以试试用可以国内转发的代理 |
开的全局 |
哥,可以加个微信帮我看看问题吗?15523313206 感激不尽 |
DIN-SQL用的是GPT4,你有GPT4的API key吗?
…-----原始邮件-----
发件人:"Shi Xiang Xiang" ***@***.***>
发送时间:2023-06-12 17:02:57 (星期一)
收件人: MohammadrezaPourreza/Few-shot-NL2SQL-with-prompting ***@***.***>
抄送: BeachWang ***@***.***>, Author ***@***.***>
主题: Re: [MohammadrezaPourreza/Few-shot-NL2SQL-with-prompting] About the Exec Acc in your paper (Issue #7)
Thank you for your great work @MohammadrezaPourreza. I got the same number of 82.8 as @amity871028. I am using the EX accuracy obtained by https://github.com/taoyds/test-suite-sql-eval. I think this is also used as the evaluation script for Spider-test. I am guessing that you were using the https://github.com/taoyds/spider/blob/master/evaluation.py for EX accuracy which always generates a number that is a bit lower than that from the test-suite. Not sure if I am right so ignore it if my guess is wrong.
哥,可以加个微信帮我看看问题吗?15523313206 感激不尽
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
@MohammadrezaPourreza I also got different results when evaluated the https://github.com/MohammadrezaPourreza/Few-shot-NL2SQL-with-prompting/blob/main/GPT4_results/DIN-SQL.csv! Is there any update on this issue? This is how I formatted the files for evaluation: din_sql_gold_evalformat.csv My command:
|
I find that Liu show the Exec Acc is 70.1 in their (Liu et al., 2023a), but there is 60.1 in your paper. Is it a mistake here? Do you have used the same evaluation codes in Exec?
The text was updated successfully, but these errors were encountered: