Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The gsm-hard dataset contains some issues (negative number targets) #12

Open
Madd0g opened this issue Mar 9, 2023 · 1 comment
Open

Comments

@Madd0g
Copy link

Madd0g commented Mar 9, 2023

Correct me if I'm wrong, most of these problems should not have negative solutions, but I see over a hundred of negative target values. The gsm8k file only has 2 negative examples.

Thanks

@urialon
Copy link
Collaborator

urialon commented Mar 22, 2023

Hi @Madd0g ,
Thank you for your interest in our work!

Your observation is correct. Since the GSM-Hard benchmark was created automatically, it may contain negative target values or "unnatural" positive values.
Unfortunately, we do not have the resources to manually annotate all examples, so our assumption is that there is a penalty of 5%-10% drop in performance for all models and prompting approaches that are evaluated on this benchmark. Since this penalty is similar to all approaches, we believe that the relative comparison between different approaches is the right thing to measure.

Best,
Uri

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants