Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task HumanEval/092 has contradictory tests in Rust #142

Open
geajack opened this issue May 20, 2024 · 5 comments
Open

Task HumanEval/092 has contradictory tests in Rust #142

geajack opened this issue May 20, 2024 · 5 comments

Comments

@geajack
Copy link

geajack commented May 20, 2024

The Rust version of HumanEval/092 contains the following lines:

assert_eq!(candidate(3.0, 4.0, 7.0), true);
assert_eq!(candidate(3.0, 4.0, 7.0), false);

(I think this is row 67 of the huggingface dataset for multipl-E, but I haven't checked)

This obviously makes the tests unsatisfiable. It seems like this was a type-casting issue when translating from Python, the original tests read:

assert candidate(3,4,7)==True, "This prints if this assert fails 9 (also good for debugging!)"
assert candidate(3.0,4,7)==False, "This prints if this assert fails 10 (also good for debugging!)"
@arjunguha
Copy link
Member

wow, thanks. yeah, we should make a decision on how to fix this. I'm going to guess that this affects other typed languages too.

@arjunguha
Copy link
Member

have you see HumanEval+ btw? Does that address this?

@geajack
Copy link
Author

geajack commented May 21, 2024

No, I haven't looked into Eval+

@arjunguha
Copy link
Member

The original Python problem barely makes sense in a typed language such as Rust:

https://github.com/nuprl/MultiPL-E/blob/main/datasets/originals/HumanEval_92_any_int.py

It's not clear to me if this should be fixed by changing the problem, removing the problem from MultiPL-E, or just left as something that fails.

@Randl
Copy link
Contributor

Randl commented May 27, 2024

I would read the problem as "the number is an integer" rather than "the type of variable is integer", i.e., I'd expect

assert candidate(3.0,4,7)==True

That, however, would mean the problem doesn't match the original HumanEval, so maybe it's better to just drop it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants