[Feature]: Retry on failure functionality #2221

neubig · 2024-06-03T13:24:03Z

What problem or use case are you trying to solve?

Sometimes models fail to do their job correctly, and we would benefit from starting all over from the beginning. There are a few examples of this in the agent literature:

Aider recently introduced a harness for testing SWE-bench that allows for retries when tests and linting don't pass on swe-bench
@Jiayi-Pan has work on Evaluation and Refinement for web agents that use a reward model to judge when a web task has failed, a reset mechanism to return to the beginning, and a method for improving the prompts based on reflexion.
@niansong1996 has a method LEVER that uses a learned verifier to rerank code generation execution results.

Describe the UX of the solution you'd like

Ideally, this would be something that could be implemented in a general way, so that we could implement different strategies with a shared interface. For instance:

class ResetStrategy:

   @abstractmethod
   def initialize_state():
       """Take note of the initial state that should be reset too."""
       ...

   @abstractmethod
   def verify(...):
      """This verifies whether the agent has reached a failure state."""
      ...
   
   @abstractmethod
   def reset(...):
      """This performs some sort of reset."""
      ...

   @abstractmethod
   def message_on_reset(...):
      """This creates a message to the agent upon reset (e.g. a task with a prompt based on reflexion)."""
      ...

Then, when using OpenDevin, we could choose an option that says "retry N times when you get stuck", and select the strategy that is used to do so.

Do you have thoughts on the technical implementation?

The actual reset strategies would vary based on the task. For instance:

AiderResetStrategy (code reference):
- initialize: save the current git commit of the repository commit_id
- verify: tests+linting pass
- reset: git checkout commit_id
- message_on_failure: no-op
EvalRefineResetStrategy (code reference):
- initialize: save the current web page initial_page
- verify: the reward model is positive
- reset: goto(initial_page)
- message_on_failure: reflexion prompt

This could either be integrated into OpenDevin, allowing for retries in the main app as well

Additional context:

Aider superissue
Thanks to @xingyaoww and @frankxu2004 for offline discussion

The text was updated successfully, but these errors were encountered:

Jiayi-Pan · 2024-06-06T05:26:28Z

Thanks for creating the issue!
Although I don’t have much spare bandwidth recently, I am definitely interested in bringing EvalRefineResetStrategy and the retry functionality into OpenDevin. I will keep an eye on this PR and contribute once I have the time.

github-actions · 2024-08-10T01:51:42Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2024-09-13T01:56:02Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Vaishakh-SM · 2024-10-05T16:58:31Z

Hi!
Is anyone working on this?

neubig · 2024-10-05T22:34:17Z

Hey @Vaishakh-SM , I think nobody is working on this, but @xingyaoww was thinking about adding multiple runs to evaluation. I think that would be a parallel effort though, because it would involve running multiple times and picking the best one, as opposed to restarting when the first try didn't work.

If you'd be interested in taking a look it'd be welcome!

Vaishakh-SM · 2024-10-08T05:38:53Z

This seems like an interesting problem!

I'll take a look and get back to this sometime this week.

github-actions · 2024-11-11T01:59:02Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

mamoodi · 2024-12-05T16:48:02Z

@neubig this is a really old issue. Just want to make sure, we haven't implemented this yet, right?

neubig · 2024-12-05T16:52:39Z

Yep, @xingyaoww is working on a critic that could help implement this.

github-actions · 2025-01-27T01:58:16Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

manzke · 2025-02-25T18:05:37Z

Using OpenHands now for a whole project, I just can say a retry in general would be nice. I've seen several failures, when things could not be replaced or far more often, if they could not be generated because of rate limits of the models. The bigger your codebase gets, the more tokens are used and the more often you hit the rate limits.
This led to several cases, where files had been deleted, but couldn't be generated again. Happy to share more insights.

neubig added the enhancement New feature or request label Jun 3, 2024

neubig changed the title ~~[Feature]: Aider-inspired retries in SWE-Bench evaluation~~ [Feature]: Retry on failure functionality Jun 4, 2024

neubig mentioned this issue Jul 3, 2024

Create Aider Agent #120

Closed

mamoodi added the medium effort Estimated medium effort label Jul 6, 2024

github-actions bot added the Stale Inactive for 30 days label Aug 10, 2024

xingyaoww removed the Stale Inactive for 30 days label Aug 13, 2024

github-actions bot added the Stale Inactive for 30 days label Sep 13, 2024

enyst removed the Stale Inactive for 30 days label Sep 13, 2024

github-actions bot added the Stale Inactive for 30 days label Nov 11, 2024

xingyaoww removed the Stale Inactive for 30 days label Nov 11, 2024

xingyaoww added this to the 2025-01 milestone Dec 20, 2024

xingyaoww added this to OpenHands Roadmap Dec 20, 2024

xingyaoww self-assigned this Dec 20, 2024

xingyaoww moved this to In Progress in OpenHands Roadmap Dec 20, 2024

ryanhoangt mentioned this issue Dec 27, 2024

[Feature Request] Implement a reflection/self-critique module for CodeAct #5865

Closed

neubig mentioned this issue Jan 13, 2025

[Bug]: Agent finishes before tasks are fully complete #6204

Open

1 task

github-actions bot added the Stale Inactive for 30 days label Jan 27, 2025

xingyaoww removed the Stale Inactive for 30 days label Jan 27, 2025

xingyaoww modified the milestones: 2025-01, 2025-02 Jan 31, 2025

rbren modified the milestones: 2025-02, 2025-03 Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Retry on failure functionality #2221

[Feature]: Retry on failure functionality #2221

neubig commented Jun 3, 2024 •

edited

Loading

Jiayi-Pan commented Jun 6, 2024

github-actions bot commented Aug 10, 2024

github-actions bot commented Sep 13, 2024

Vaishakh-SM commented Oct 5, 2024

neubig commented Oct 5, 2024

Vaishakh-SM commented Oct 8, 2024

github-actions bot commented Nov 11, 2024

mamoodi commented Dec 5, 2024

neubig commented Dec 5, 2024

github-actions bot commented Jan 27, 2025

manzke commented Feb 25, 2025

[Feature]: Retry on failure functionality #2221

[Feature]: Retry on failure functionality #2221

Comments

neubig commented Jun 3, 2024 • edited Loading

Jiayi-Pan commented Jun 6, 2024

github-actions bot commented Aug 10, 2024

github-actions bot commented Sep 13, 2024

Vaishakh-SM commented Oct 5, 2024

neubig commented Oct 5, 2024

Vaishakh-SM commented Oct 8, 2024

github-actions bot commented Nov 11, 2024

mamoodi commented Dec 5, 2024

neubig commented Dec 5, 2024

github-actions bot commented Jan 27, 2025

manzke commented Feb 25, 2025

neubig commented Jun 3, 2024 •

edited

Loading