About the Exec Acc in your paper #7

BeachWang · 2023-05-17T13:01:34Z

I find that Liu show the Exec Acc is 70.1 in their (Liu et al., 2023a), but there is 60.1 in your paper. Is it a mistake here? Do you have used the same evaluation codes in Exec?

BeachWang · 2023-05-17T13:07:31Z

Besides, I am confuse about that DIN-SQL have similar Exact match Accs in Table 2 and Table 3 but two significant differences Exec Accs.

MohammadrezaPourreza · 2023-05-17T13:26:01Z

Thank you so much for pointing these out. First, the exec acc method we are using to evaluate our model is the official metric published here: https://github.com/taoyds/test-suite-sql-eval. This metric which is called "Exec acc" is actually computing the test suite accuracy as also stated in the repo "This repo contains test suite evaluation metric ". Thus we compared our method with the work of (Liu et al., 2023a) in terms of test-suite accuracy and their reported test-suite accuracy is 60.1.
Second, table 2 contains the results of our method on the test set of spider and table 3 has the results on the dev set of spider.

BeachWang · 2023-05-18T03:14:31Z

I use the official metric to eval the result on dev set you publish in the GPT4_results file and the results of Exec acc are 85.1 for DIN-SQL and 80.1 for few-shot. Maybe you have used the different metric in Table 2 and Table 3 I guess?

MohammadrezaPourreza · 2023-05-26T13:35:05Z

That's interesting, maybe there is a problem with the script we are using. thank you so much for letting us know.

amity871028 · 2023-06-01T02:26:10Z

I also got a different score about GPT4_results.
@MohammadrezaPourreza Could I know what your script is?
I use https://github.com/taoyds/test-suite-sql-eval and follow it's steps.
I run a script like that:
python3 evaluation.py --gold ./my_test/gold_example.txt --pred ./my_test/din_sql_pred_sql.txt --db ./database/--etype exec --plug_value
I got these:

if I run a script like that:
python3 evaluation.py --gold ./my_test/gold_example.txt --pred ./my_test/din_sql_pred_sql.txt --db ./database/--etype exec
and, I got these:

Both of 0.863 and 0.828 are different from your paper's result.
I'm so curious which part I run wrongly.
Thanks!

MohammadrezaPourreza · 2023-06-01T20:37:11Z

It's interesting for me as well, many people told me they got different results on the dev set, and even among those the results were not consistent. We are trying to figure out where is the problem.

shuaichenchang · 2023-06-01T21:43:13Z

Thank you for your great work @MohammadrezaPourreza. I got the same number of 82.8 as @amity871028. I am using the EX accuracy obtained by https://github.com/taoyds/test-suite-sql-eval. I think this is also used as the evaluation script for Spider-test. I am guessing that you were using the https://github.com/taoyds/spider/blob/master/evaluation.py for EX accuracy which always generates a number that is a bit lower than that from the test-suite. Not sure if I am right so ignore it if my guess is wrong.

amity871028 · 2023-06-04T17:13:52Z

It's interesting for me as well, many people told me they got different results on the dev set, and even among those the results were not consistent. We are trying to figure out where is the problem.

Thank you for your replying! I will wait for your result.

ShiXiangXiang123 · 2023-06-09T10:07:11Z

It's interesting for me as well, many people told me they got different results on the dev set, and even among those the results were not consistent. We are trying to figure out where is the problem.

Thank you for your replying! I will wait for your result.

哥们帮我看看我的问题可以吗?

运行后一直这样，不能输入

linxin6 · 2023-06-09T12:03:13Z

大概是网络问题

ShiXiangXiang123 · 2023-06-09T12:39:12Z

大概是网络问题

我链接了梯子，还是不行。为什么呢

linxin6 · 2023-06-09T13:44:51Z

全局开了吗？或者你可以试试用可以国内转发的代理

ShiXiangXiang123 · 2023-06-09T15:40:38Z

全局开了吗？或者你可以试试用可以国内转发的代理

开的全局

ShiXiangXiang123 · 2023-06-12T09:02:46Z

Thank you for your great work @MohammadrezaPourreza. I got the same number of 82.8 as @amity871028. I am using the EX accuracy obtained by https://github.com/taoyds/test-suite-sql-eval. I think this is also used as the evaluation script for Spider-test. I am guessing that you were using the https://github.com/taoyds/spider/blob/master/evaluation.py for EX accuracy which always generates a number that is a bit lower than that from the test-suite. Not sure if I am right so ignore it if my guess is wrong.

哥，可以加个微信帮我看看问题吗？15523313206 感激不尽

BeachWang · 2023-06-14T03:13:16Z

DIN-SQL用的是GPT4，你有GPT4的API key吗？

…

-----原始邮件----- 发件人:"Shi Xiang Xiang" ***@***.***> 发送时间:2023-06-12 17:02:57 (星期一) 收件人: MohammadrezaPourreza/Few-shot-NL2SQL-with-prompting ***@***.***> 抄送: BeachWang ***@***.***>, Author ***@***.***> 主题: Re: [MohammadrezaPourreza/Few-shot-NL2SQL-with-prompting] About the Exec Acc in your paper (Issue #7) Thank you for your great work @MohammadrezaPourreza. I got the same number of 82.8 as @amity871028. I am using the EX accuracy obtained by https://github.com/taoyds/test-suite-sql-eval. I think this is also used as the evaluation script for Spider-test. I am guessing that you were using the https://github.com/taoyds/spider/blob/master/evaluation.py for EX accuracy which always generates a number that is a bit lower than that from the test-suite. Not sure if I am right so ignore it if my guess is wrong. 哥，可以加个微信帮我看看问题吗？15523313206 感激不尽 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

arian-askari · 2024-05-16T15:17:33Z

@MohammadrezaPourreza I also got different results when evaluated the https://github.com/MohammadrezaPourreza/Few-shot-NL2SQL-with-prompting/blob/main/GPT4_results/DIN-SQL.csv! Is there any update on this issue?

This is how I formatted the files for evaluation:

din_sql_gold_evalformat.csv
din_sql_prediction_evalformat.csv

My command:

test-suite-sql-eval-master\evaluation.py --gold din_sql_gold_evalformat.csv  --pred din_sql_prediction_evalformat.csv --etype exec --db .\database

hoangdzung mentioned this issue Jun 5, 2023

Difference between two evaluation codes taoyds/test-suite-sql-eval#16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the Exec Acc in your paper #7

About the Exec Acc in your paper #7

BeachWang commented May 17, 2023

BeachWang commented May 17, 2023

MohammadrezaPourreza commented May 17, 2023

BeachWang commented May 18, 2023

MohammadrezaPourreza commented May 26, 2023

amity871028 commented Jun 1, 2023 •

edited

Loading

MohammadrezaPourreza commented Jun 1, 2023

shuaichenchang commented Jun 1, 2023

amity871028 commented Jun 4, 2023

ShiXiangXiang123 commented Jun 9, 2023

linxin6 commented Jun 9, 2023

ShiXiangXiang123 commented Jun 9, 2023

linxin6 commented Jun 9, 2023

ShiXiangXiang123 commented Jun 9, 2023

ShiXiangXiang123 commented Jun 12, 2023

BeachWang commented Jun 14, 2023 via email

arian-askari commented May 16, 2024 •

edited

Loading

About the Exec Acc in your paper #7

About the Exec Acc in your paper #7

Comments

BeachWang commented May 17, 2023

BeachWang commented May 17, 2023

MohammadrezaPourreza commented May 17, 2023

BeachWang commented May 18, 2023

MohammadrezaPourreza commented May 26, 2023

amity871028 commented Jun 1, 2023 • edited Loading

MohammadrezaPourreza commented Jun 1, 2023

shuaichenchang commented Jun 1, 2023

amity871028 commented Jun 4, 2023

ShiXiangXiang123 commented Jun 9, 2023

linxin6 commented Jun 9, 2023

ShiXiangXiang123 commented Jun 9, 2023

linxin6 commented Jun 9, 2023

ShiXiangXiang123 commented Jun 9, 2023

ShiXiangXiang123 commented Jun 12, 2023

BeachWang commented Jun 14, 2023 via email

arian-askari commented May 16, 2024 • edited Loading

amity871028 commented Jun 1, 2023 •

edited

Loading

arian-askari commented May 16, 2024 •

edited

Loading