Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: When running eval_infer on SWE-Gym, a compatibility error occurred. #6910

Open
1 task done
lycfight opened this issue Feb 24, 2025 · 7 comments
Open
1 task done
Labels
bug Something isn't working evaluation Related to running evaluations with OpenHands

Comments

@lycfight
Copy link

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Describe the bug and reproduction steps

(OpenHands) root@cpu01-2050-SWE-bench:~/OpenHands# ./evaluation/benchmarks/swe_bench/scripts/eval_infer.sh /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1/output.jsonl "" SWE-Gym/SWE-Gym train
INSTANCE_ID: 
DATASET_NAME: SWE-Gym/SWE-Gym
SPLIT: train
Evaluating output.jsonl @ /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1
==============================================================
Detecting whether PROCESS_FILEPATH is in OH format or in SWE-bench format
==============================================================
The file IS NOT in SWE-bench format.
Merged output file with fine-grained report will be saved to /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1
2025-02-24 11:54:11,665 - httpx - INFO - HTTP Request: GET https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json "HTTP/1.1 200 OK"
11:54:13 - openhands:INFO: run_infer.py:97 - Using docker image prefix: docker.io/xingyaoww/
11:54:13 - openhands:INFO: eval_infer.py:43 - Using docker image prefix: docker.io/xingyaoww/
Converting /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1/output.jsonl to /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1/output.swebench.jsonl
SWEBENCH_FORMAT_JSONL: /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1/output.swebench.jsonl
==============================================================
Running SWE-bench evaluation
==============================================================
Running SWE-bench evaluation on the whole input file...
<frozen runpy>:128: RuntimeWarning: 'swebench.harness.run_evaluation' found in sys.modules after import of package 'swebench.harness', but prior to execution of 'swebench.harness.run_evaluation'; this may result in unpredictable behaviour
Running 477 unevaluated instances...
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/root/.cache/pypoetry/virtualenvs/openhands-ai-QAHFClL0-py3.12/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 569, in <module>
    main(**vars(args))
  File "/root/.cache/pypoetry/virtualenvs/openhands-ai-QAHFClL0-py3.12/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 532, in main
    build_env_images(client, dataset, force_rebuild, max_workers)
  File "/root/.cache/pypoetry/virtualenvs/openhands-ai-QAHFClL0-py3.12/lib/python3.12/site-packages/swebench/harness/docker_build.py", line 282, in build_env_images
    build_base_images(client, dataset, force_rebuild)
  File "/root/.cache/pypoetry/virtualenvs/openhands-ai-QAHFClL0-py3.12/lib/python3.12/site-packages/swebench/harness/docker_build.py", line 173, in build_base_images
    test_specs = get_test_specs_from_dataset(dataset)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/pypoetry/virtualenvs/openhands-ai-QAHFClL0-py3.12/lib/python3.12/site-packages/swebench/harness/test_spec.py", line 117, in get_test_specs_from_dataset
    return list(map(make_test_spec, cast(list[SWEbenchInstance], dataset)))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/pypoetry/virtualenvs/openhands-ai-QAHFClL0-py3.12/lib/python3.12/site-packages/swebench/harness/test_spec.py", line 306, in make_test_spec
    specs = MAP_REPO_VERSION_TO_SPECS[repo][version]
            ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
KeyError: 'bokeh/bokeh'
MODEL_NAME_OR_PATH: qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1
RESULT_OUTPUT_DIR: /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1
mv: cannot stat 'logs/run_evaluation/20250224_115415/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1': No such file or directory
mv: cannot stat '/root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1': No such file or directory
No report file found: Both /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1/report.json and /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1/output.swebench_eval.jsonl do not exist.

OpenHands Installation

Docker command in README

OpenHands Version

No response

Operating System

None

Logs, Errors, Screenshots, and Additional Context

No response

@lycfight
Copy link
Author

https://github.com/SWE-Gym/SWE-Bench-Fork#swe-bench-fork-for-swe-gym
@xingyaoww
It seems you forgot to add the updated SWE-Bench-Fork for SWE-Gym to the OpenHands environment.
To run eval_infer on the SWE-Gym dataset, a specific version of SWE-Bench is required. Here is a temporary fix:
Change this line:
https://github.com/All-Hands-AI/OpenHands/blob/fab4532f6bef3beedcd8f93939e3658ee13ea27c/pyproject.toml#L144
To this line:
https://github.com/SWE-Gym/OpenHands/blob/e644a2ca45c3623b27a7e6c169e3d479f0a87fbc/pyproject.toml#L135
Then execute:
poetry update swebench

@mamoodi mamoodi added the evaluation Related to running evaluations with OpenHands label Feb 24, 2025
@xingyaoww
Copy link
Collaborator

Hi, I think SWE-Gym support is not official in "main" yet. I'd advise using this PR if you want to run SWE-Gym: #6651.

We will add more documentations when that PR wraps up.

@lycfight
Copy link
Author

Hi, I think SWE-Gym support is not official in "main" yet. I'd advise using this PR if you want to run SWE-Gym: #6651.

We will add more documentations when that PR wraps up.

I have generated results for the SWE-Gym dataset using OpenHands' run_infer, and I want to evaluate them. However, I encountered compatibility errors while running eval_infer. Is there a simple way to evaluate these results?

@lycfight
Copy link
Author

Hi, I think SWE-Gym support is not official in "main" yet. I'd advise using this PR if you want to run SWE-Gym: #6651.

We will add more documentations when that PR wraps up.

I don't quite understand the purpose of running this Docker startup command. I don't need the frontend to run; I just need to run run_infer and eval_infer on SWE-Gym to obtain the correct trajectory data.

Do you mean that using the code from the xw/swegym branch will allow me to correctly run run_infer and eval_infer to obtain the correct trajectory data?

I've been struggling with bugs caused by the OpenHands version update and SWE-Gym no longer being maintained for several weeks now. I'm hoping to find a simple way to get OpenHands running on SWE-Gym properly.

@xingyaoww
Copy link
Collaborator

@xingyaoww
Copy link
Collaborator

xingyaoww commented Feb 25, 2025

Do you mean that using the code from the xw/swegym branch will allow me to correctly run run_infer and eval_infer to obtain the correct trajectory data?

Yes - you just need to switch to that branch and run make build again. The dependencies are correct on that branch.

@lycfight
Copy link
Author

Do you mean that using the code from the xw/swegym branch will allow me to correctly run run_infer and eval_infer to obtain the correct trajectory data?

Yes - you just need to switch to that branch and run make build again. The dependencies are correct on that branch.

Thank you very much for your response. However, I still have some questions:

  • What is ALLHANDS_API_KEY?
  • Does the rollout_swegym.sh script only support RemoteRuntime? Is it available for use now?
  • Can this script run locally? If not, what modifications are required to make it work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working evaluation Related to running evaluations with OpenHands
Projects
None yet
Development

No branches or pull requests

3 participants