Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Results on HumanEval #25

Open
htcml opened this issue Feb 17, 2023 · 1 comment
Open

Performance Results on HumanEval #25

htcml opened this issue Feb 17, 2023 · 1 comment

Comments

@htcml
Copy link

htcml commented Feb 17, 2023

I am reading your CodeRL paper. It uses the APPS benchmark to show the performance comparison with Codex. Do you have any comparison results using the HumanEval dataset?

@henryhungle
Copy link
Collaborator

@htcml thanks for reading the paper.

In our case, HumanEval dataset would not be the best evaluation benchmark. The reason is that HumanEval is treated as a docstring to code task in which the function signature and its docstring (in code comment block) is given. It is ideal for zero-shot evaluation for larger LMs such as CodeGen and Codex.

In our paper, we focus more on natural language text description of a problem and generate a program from scratch.

One workaround is that we can reformulate the HumanEval as text-to-code tasks but the comparison might not be fair with current baselines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants