Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(protocol-engine): Support door open/close during error recovery #15478

Merged
merged 16 commits into from
Jun 25, 2024

Conversation

SyntaxColoring
Copy link
Contributor

@SyntaxColoring SyntaxColoring commented Jun 20, 2024

Overview

This corrects the behavior when you open the door in the middle of error recovery. Closes EXEC-383.

Test plan

  1. Enter error recovery.
  2. Open the door. The error recovery UI should still be active, but there should be a modal saying to close the door.
  3. Close the door. The error recovery UI should give you some kind of "confirm door closed and continue" button. We have not implemented this on the frontend yet (EXEC-565). You can manually issue an HTTP play action instead.
  4. Error recovery should continue as normal after you click the "confirm and continue" button (once EXEC-565 is implemented) or issue an HTTP play action.

Details

This introduces two new run statuses to accompany the existing awaiting-recovery status:

  1. awaiting-recovery (already existed)
  2. awaiting-recovery-blocked-by-open-door (new)
  3. awaiting-recovery-paused (new)
stateDiagram-v2
    awaiting_recovery: awaiting-recovery
    awaiting_recovery_paused: awaiting-recovery-paused
    awaiting_recovery_blocked_by_open_door: awaiting-recovery-blocked-by-open-door
    stop_requested: stop-requested etc.
    running: running etc.
    
    [*] --> awaiting_recovery: Command fails recoverably
    awaiting_recovery --> awaiting_recovery: HTTP `play` action
    awaiting_recovery --> awaiting_recovery_blocked_by_open_door: Door opens
    awaiting_recovery --> stop_requested: HTTP `stop` action
    awaiting_recovery_blocked_by_open_door --> awaiting_recovery_paused: Door closes
    awaiting_recovery_blocked_by_open_door --> stop_requested: HTTP `stop` action
    awaiting_recovery_paused --> awaiting_recovery: HTTP `play` action
    awaiting_recovery_paused --> stop_requested: HTTP `stop` action

    awaiting_recovery --> running: HTTP `resume-from-recovery` action
    running --> [*]
    stop_requested --> [*]
Loading

Having all 3 is necessary to implement the same behavior that we have for the main protocol part of the run, where after closing the door you need to click a resume button to get the robot moving again.

We can't reuse the existing blocked-by-open-door and paused statuses because various parts of the app look at the status to determine whether to be in recovery mode. We need to maintain an unambiguous awaiting-recovery-* status until error recovery is complete.

Review requests

  • Is adding these new statuses a good idea, or do we actually want to bite the bullet and split this monolithic field up into multiple smaller ones now?
  • Do my frontend changes seem correct? Are there any other places that I missed?
  • Do my docstrings on EngineStatus look correct?
  • I suspect there's a good opportunity to test this more exhaustively with Hypothesis. Does anyone have specific ideas for what invariants we would use?

Risk assessment

Medium. There are a lot of things on the frontend that look at status, and we have to carefully update them all. Consequences for getting it wrong range from buttons being enabled/disabled at the wrong times, to the screen showing an entirely wrong flow.

* Introduce new run `status` values: `awaiting-recovery-paused` and `awaiting-recovery-blocked-by-open-door`.
* Document all `RunStatus` values.
Comment on lines -40 to +46
case EngineStatus.PAUSED | EngineStatus.AWAITING_RECOVERY | EngineStatus.BLOCKED_BY_OPEN_DOOR:
case (
EngineStatus.PAUSED
| EngineStatus.BLOCKED_BY_OPEN_DOOR
| EngineStatus.AWAITING_RECOVERY
| EngineStatus.AWAITING_RECOVERY_PAUSED
| EngineStatus.AWAITING_RECOVERY_BLOCKED_BY_OPEN_DOOR
):
Copy link
Contributor Author

@SyntaxColoring SyntaxColoring Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's ongoing discussion about what the status light should show during error recovery. For now, this retains my initial arbitrary choice, which was to pulse blue to indicate that the user needs to intervene, like a protocol pause.

@SyntaxColoring SyntaxColoring requested review from a team June 24, 2024 21:34
@SyntaxColoring SyntaxColoring marked this pull request as ready for review June 24, 2024 21:34
@SyntaxColoring SyntaxColoring requested review from a team as code owners June 24, 2024 21:34
@SyntaxColoring SyntaxColoring requested review from ncdiehl11 and removed request for a team June 24, 2024 21:34
Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I think we're Soon going to want to split it up but not quite yet.

@DerekMaggio
Copy link
Contributor

DerekMaggio commented Jun 25, 2024

Even though it's a bit out of scope for this PR, can we add a test that ensures the door's state does not affect stopping?
Especially the state of the E-Stop being pulled.
While the StopAction and HardwareStoppedAction don't have any logic around the door's state I think we should have a sanity check to verify that stopping always works as expected.

Copy link
Member

@sanni-t sanni-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the possible states' diagram. Although it's getting a bit complex, the state flow and code make sense!

api/src/opentrons/protocol_engine/types.py Outdated Show resolved Hide resolved
Copy link
Contributor

@TamarZanzouri TamarZanzouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the diagram! looks good!

@SyntaxColoring
Copy link
Contributor Author

Even though it's a bit out of scope for this PR, can we add a test that ensures the door's state does not affect stopping? Especially the state of the E-Stop being pulled. While the StopAction and HardwareStoppedAction don't have any logic around the door's state I think we should have a sanity check to verify that stopping always works as expected.

Added test_final_state_after_error_recovery_stop(), which is not exhaustive. I'm going to see if Hypothesis can pull weight here. Otherwise it's going to get explosive pretty quickly with all the possible interactions between states and actions.

Copy link
Contributor

A PR has been opened to address analyses snapshot changes. Please review the changes here: #15514

This PR is an automated snapshot update request. Please review the
changes and merge if they are acceptable or find you bug and fix it.

Co-authored-by: SyntaxColoring <[email protected]>
@SyntaxColoring SyntaxColoring requested a review from a team as a code owner June 25, 2024 19:29
@SyntaxColoring SyntaxColoring merged commit 4a45de9 into edge Jun 25, 2024
22 of 23 checks passed
@SyntaxColoring SyntaxColoring deleted the fix_door_blocking branch June 25, 2024 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants