[Bug]: When running eval_infer on SWE-Gym, a compatibility error occurred. #6910

lycfight · 2025-02-24T04:03:22Z

Is there an existing issue for the same bug?

I have checked the existing issues.

Describe the bug and reproduction steps

(OpenHands) root@cpu01-2050-SWE-bench:~/OpenHands# ./evaluation/benchmarks/swe_bench/scripts/eval_infer.sh /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1/output.jsonl "" SWE-Gym/SWE-Gym train
INSTANCE_ID: 
DATASET_NAME: SWE-Gym/SWE-Gym
SPLIT: train
Evaluating output.jsonl @ /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1
==============================================================
Detecting whether PROCESS_FILEPATH is in OH format or in SWE-bench format
==============================================================
The file IS NOT in SWE-bench format.
Merged output file with fine-grained report will be saved to /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1
2025-02-24 11:54:11,665 - httpx - INFO - HTTP Request: GET https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json "HTTP/1.1 200 OK"
11:54:13 - openhands:INFO: run_infer.py:97 - Using docker image prefix: docker.io/xingyaoww/
11:54:13 - openhands:INFO: eval_infer.py:43 - Using docker image prefix: docker.io/xingyaoww/
Converting /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1/output.jsonl to /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1/output.swebench.jsonl
SWEBENCH_FORMAT_JSONL: /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1/output.swebench.jsonl
==============================================================
Running SWE-bench evaluation
==============================================================
Running SWE-bench evaluation on the whole input file...
<frozen runpy>:128: RuntimeWarning: 'swebench.harness.run_evaluation' found in sys.modules after import of package 'swebench.harness', but prior to execution of 'swebench.harness.run_evaluation'; this may result in unpredictable behaviour
Running 477 unevaluated instances...
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/root/.cache/pypoetry/virtualenvs/openhands-ai-QAHFClL0-py3.12/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 569, in <module>
    main(**vars(args))
  File "/root/.cache/pypoetry/virtualenvs/openhands-ai-QAHFClL0-py3.12/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 532, in main
    build_env_images(client, dataset, force_rebuild, max_workers)
  File "/root/.cache/pypoetry/virtualenvs/openhands-ai-QAHFClL0-py3.12/lib/python3.12/site-packages/swebench/harness/docker_build.py", line 282, in build_env_images
    build_base_images(client, dataset, force_rebuild)
  File "/root/.cache/pypoetry/virtualenvs/openhands-ai-QAHFClL0-py3.12/lib/python3.12/site-packages/swebench/harness/docker_build.py", line 173, in build_base_images
    test_specs = get_test_specs_from_dataset(dataset)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/pypoetry/virtualenvs/openhands-ai-QAHFClL0-py3.12/lib/python3.12/site-packages/swebench/harness/test_spec.py", line 117, in get_test_specs_from_dataset
    return list(map(make_test_spec, cast(list[SWEbenchInstance], dataset)))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/pypoetry/virtualenvs/openhands-ai-QAHFClL0-py3.12/lib/python3.12/site-packages/swebench/harness/test_spec.py", line 306, in make_test_spec
    specs = MAP_REPO_VERSION_TO_SPECS[repo][version]
            ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
KeyError: 'bokeh/bokeh'
MODEL_NAME_OR_PATH: qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1
RESULT_OUTPUT_DIR: /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1
mv: cannot stat 'logs/run_evaluation/20250224_115415/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1': No such file or directory
mv: cannot stat '/root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1': No such file or directory
No report file found: Both /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1/report.json and /root/OpenHands/evaluation/evaluation_outputs/outputs/SWE-Gym__SWE-Gym-train/CodeActAgent/qwen-max-2025-01-25_maxiter_100_N_v0.25.0-no-hint-run_1/output.swebench_eval.jsonl do not exist.

OpenHands Installation

Docker command in README

OpenHands Version

No response

Operating System

None

Logs, Errors, Screenshots, and Additional Context

No response

The text was updated successfully, but these errors were encountered:

lycfight · 2025-02-24T08:02:43Z

https://github.com/SWE-Gym/SWE-Bench-Fork#swe-bench-fork-for-swe-gym
@xingyaoww
It seems you forgot to add the updated SWE-Bench-Fork for SWE-Gym to the OpenHands environment.
To run eval_infer on the SWE-Gym dataset, a specific version of SWE-Bench is required. Here is a temporary fix:
Change this line:
https://github.com/All-Hands-AI/OpenHands/blob/fab4532f6bef3beedcd8f93939e3658ee13ea27c/pyproject.toml#L144
To this line:
https://github.com/SWE-Gym/OpenHands/blob/e644a2ca45c3623b27a7e6c169e3d479f0a87fbc/pyproject.toml#L135
Then execute:
poetry update swebench

xingyaoww · 2025-02-24T15:37:04Z

Hi, I think SWE-Gym support is not official in "main" yet. I'd advise using this PR if you want to run SWE-Gym: #6651.

We will add more documentations when that PR wraps up.

lycfight · 2025-02-24T16:30:13Z

Hi, I think SWE-Gym support is not official in "main" yet. I'd advise using this PR if you want to run SWE-Gym: #6651.

We will add more documentations when that PR wraps up.

I have generated results for the SWE-Gym dataset using OpenHands' run_infer, and I want to evaluate them. However, I encountered compatibility errors while running eval_infer. Is there a simple way to evaluate these results?

lycfight · 2025-02-25T14:26:08Z

Hi, I think SWE-Gym support is not official in "main" yet. I'd advise using this PR if you want to run SWE-Gym: #6651.

We will add more documentations when that PR wraps up.

I don't quite understand the purpose of running this Docker startup command. I don't need the frontend to run; I just need to run run_infer and eval_infer on SWE-Gym to obtain the correct trajectory data.

Do you mean that using the code from the xw/swegym branch will allow me to correctly run run_infer and eval_infer to obtain the correct trajectory data?

I've been struggling with bugs caused by the OpenHands version update and SWE-Gym no longer being maintained for several weeks now. I'm hoping to find a simple way to get OpenHands running on SWE-Gym properly.

xingyaoww · 2025-02-25T17:47:49Z

Here's more info about how to convert trajectory data: https://github.com/All-Hands-AI/OpenHands/blob/xw/swegym/evaluation/benchmarks/swe_bench/SWE-Gym.md#run-swe-gym-with-openhands

I've uploaded in the conversion script here: https://github.com/All-Hands-AI/OpenHands/blob/xw/swegym/evaluation/benchmarks/swe_bench/scripts/swegym/convert_data.ipynb

xingyaoww · 2025-02-25T17:48:11Z

Do you mean that using the code from the xw/swegym branch will allow me to correctly run run_infer and eval_infer to obtain the correct trajectory data?

Yes - you just need to switch to that branch and run make build again. The dependencies are correct on that branch.

lycfight · 2025-02-26T07:05:40Z

Do you mean that using the code from the xw/swegym branch will allow me to correctly run run_infer and eval_infer to obtain the correct trajectory data?

Yes - you just need to switch to that branch and run make build again. The dependencies are correct on that branch.

Thank you very much for your response. However, I still have some questions:

What is ALLHANDS_API_KEY?
Does the rollout_swegym.sh script only support RemoteRuntime? Is it available for use now?
Can this script run locally? If not, what modifications are required to make it work?

lycfight added the bug Something isn't working label Feb 24, 2025

kevin-support-bot bot mentioned this issue Feb 24, 2025

[Bug]: When running eval_infer on SWE-Gym, a compatibility error occurred. SmartManoj/Kevin#288

Open

mamoodi added the evaluation Related to running evaluations with OpenHands label Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: When running eval_infer on SWE-Gym, a compatibility error occurred. #6910

[Bug]: When running eval_infer on SWE-Gym, a compatibility error occurred. #6910

lycfight commented Feb 24, 2025

lycfight commented Feb 24, 2025

xingyaoww commented Feb 24, 2025

lycfight commented Feb 24, 2025

lycfight commented Feb 25, 2025

xingyaoww commented Feb 25, 2025

xingyaoww commented Feb 25, 2025 •

edited

Loading

lycfight commented Feb 26, 2025

[Bug]: When running eval_infer on SWE-Gym, a compatibility error occurred. #6910

[Bug]: When running eval_infer on SWE-Gym, a compatibility error occurred. #6910

Comments

lycfight commented Feb 24, 2025

Is there an existing issue for the same bug?

Describe the bug and reproduction steps

OpenHands Installation

OpenHands Version

Operating System

Logs, Errors, Screenshots, and Additional Context

lycfight commented Feb 24, 2025

xingyaoww commented Feb 24, 2025

lycfight commented Feb 24, 2025

lycfight commented Feb 25, 2025

xingyaoww commented Feb 25, 2025

xingyaoww commented Feb 25, 2025 • edited Loading

lycfight commented Feb 26, 2025

xingyaoww commented Feb 25, 2025 •

edited

Loading