Skip to content
This repository was archived by the owner on Jan 23, 2026. It is now read-only.

flasher: make unknown exceptions retriable#746

Merged
mangelajo merged 3 commits intomainfrom
fix-flash-retries
Nov 17, 2025
Merged

flasher: make unknown exceptions retriable#746
mangelajo merged 3 commits intomainfrom
fix-flash-retries

Conversation

@mangelajo
Copy link
Copy Markdown
Member

@mangelajo mangelajo commented Nov 14, 2025

Summary by CodeRabbit

  • Bug Fixes

    • Improved error handling with clearer exception categorization and updated retry behavior; unknown errors are now handled more robustly.
    • Retries now use exponential backoff and surface improved diagnostic messages on final failure.
  • Tests

    • Added comprehensive tests for exception categorization, chain traversal, and retry/wrapping behaviors.
  • Chores

    • CLI default fls_version updated to 0.1.9.

@netlify
Copy link
Copy Markdown

netlify bot commented Nov 14, 2025

Deploy Preview for jumpstarter-docs ready!

Name Link
🔨 Latest commit 3b07bb8
🔍 Latest deploy log https://app.netlify.com/projects/jumpstarter-docs/deploys/6917375240aa56000888a95d
😎 Deploy Preview https://deploy-preview-746--jumpstarter-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 14, 2025

Walkthrough

Centralized exception classification was added to the flasher client via _categorize_exception and _find_exception_in_chain. Retry logic now consults the categorization, uses exponential backoff (2**attempt), treats unknown exceptions as retryable (wrapped), and raises FlashError immediately for non-retryable cases. Tests expanded accordingly.

Changes

Cohort / File(s) Summary
Client exception handling refactor
packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py
Added _categorize_exception and _find_exception_in_chain; removed _get_retryable_error and _get_non_retryable_error; unified retry flow to use categorization; exponential backoff on retries; unknown exceptions wrapped as FlashRetryableError; updated docstrings and logging.
Test suite expansion
packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client_test.py
Updated imports to include FlashRetryableError/FlashNonRetryableError; extended MockLogger; added tests for categorization, ExceptionGroup traversal, CancelledError handling, unknown-exception wrapping, and cause-chain preservation.
CLI default update
.../cli (file change noted in summary)
Default fls_version bump to 0.1.9 (from 0.1.5).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Caller as Caller
    participant Client as FlasherClient
    participant Cat as _categorize_exception
    participant Find as _find_exception_in_chain
    participant Retry as Retry Loop

    Caller->>Client: flash(request)
    Client->>Client: perform operation
    alt operation raises exception
        Client->>Cat: _categorize_exception(exc)
        Cat->>Find: inspect causes / ExceptionGroups
        Find-->>Cat: matching exception found? / None
        alt Non-retryable identified
            Cat-->>Client: FlashNonRetryableError
            Client->>Caller: raise FlashError (non-retryable)
        else Retryable or wrapped as retryable
            Cat-->>Client: FlashRetryableError
            Client->>Retry: attempt < max?
            alt retry allowed
                Retry->>Retry: sleep 2**attempt
                Retry->>Client: retry operation
            else retries exhausted
                Client->>Caller: raise FlashError (with categorized_error)
            end
        end
    else success
        Client->>Caller: return success
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Focus review on _categorize_exception decision branches and priority ordering (non-retryable vs retryable).
  • Verify _find_exception_in_chain correctly traverses ExceptionGroup and nested causes without infinite recursion.
  • Check tests for coverage of edge cases (CancelledError, IOError/OSError aliasing) and that log messages preserve useful context.

Possibly related PRs

  • jumpstarter-dev/jumpstarter#730 — related flasher client changes converting exceptions into FlashRetryableError and adjusting retry behavior.
  • jumpstarter-dev/jumpstarter#696 — earlier introduction of retry/non-retry helpers that this PR replaces/refactors.

Suggested labels

backport release-0.7

Suggested reviewers

  • kirkbrauer

Poem

🐰 A hop, a sniff, a nested error found,
I chased the chains and traced them all around,
Wrapped unknowns to retry with gentle art,
Non-retryables set apart — a tidy heart,
Now flashing hops proceed with judicious bound.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.42% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: refactoring exception handling to treat unknown exceptions as retryable through a new categorization system.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-flash-retries

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py (2)

209-243: Categorization behavior matches priority rules; consider tightening type hints

The categorization logic (non‑retryable > retryable > unknown wrapped as retryable with preserved __cause__) is sound and aligns with the PR’s intent. As a minor refinement, you could make _find_exception_in_chain generic so _categorize_exception’s declared return type (FlashRetryableError | FlashNonRetryableError) aligns with what the helper returns, e.g. using a TypeVar bound to BaseException and type[TExc] instead of bare type.


245-281: Exception-chain traversal covers ExceptionGroups and causes; same optional typing tweak applies

The recursive walk over .exceptions and __cause__ handles both nested ExceptionGroups and standard cause chains well and is backed by tests. If you adopt the generic TypeVar approach mentioned for _categorize_exception, this helper’s signature can return a precise target_type instance instead of Exception | None, improving static checking without changing behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4f67f42 and 820113f.

📒 Files selected for processing (3)
  • .vscode/settings.json (1 hunks)
  • packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py (2 hunks)
  • packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client_test.py (3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: mangelajo
Repo: jumpstarter-dev/jumpstarter PR: 610
File: packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py:488-491
Timestamp: 2025-09-15T08:18:48.571Z
Learning: In the jumpstarter project, code review suggestions should stay focused on the specific scope of the PR. Suggestions about general improvements like timeout handling or error handling that are unrelated to the core changes being made should be avoided, even if they apply to modified code lines.
📚 Learning: 2025-11-05T13:45:58.271Z
Learnt from: mangelajo
Repo: jumpstarter-dev/jumpstarter PR: 735
File: packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py:15-15
Timestamp: 2025-11-05T13:45:58.271Z
Learning: In packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py, pexpect is intentionally used as a transitive dependency through the jumpstarter-driver-pyserial package. The flashers package does not declare pexpect as a direct dependency because the pyserial driver package is intended to control the pexpect version.

Applied to files:

  • packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client_test.py
  • packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py
📚 Learning: 2025-11-05T13:31:39.304Z
Learnt from: mangelajo
Repo: jumpstarter-dev/jumpstarter PR: 735
File: packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py:100-101
Timestamp: 2025-11-05T13:31:39.304Z
Learning: In packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py, the `flash()` method and its CLI command intentionally have different defaults for `fls_version`: the method defaults to `""` (empty string) while the CLI defaults to `"0.1.5"`. This is by design to provide CLI users with a convenient default while keeping the programmatic API ready for when FLS is included in the flasher images.

Applied to files:

  • packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py
🧬 Code graph analysis (1)
packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client_test.py (1)
packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py (4)
  • FlashNonRetryableError (37-38)
  • FlashRetryableError (33-34)
  • _categorize_exception (209-243)
  • _find_exception_in_chain (245-281)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: e2e
🔇 Additional comments (5)
.vscode/settings.json (1)

15-18: Editor-only settings change looks good

The added makefile.configureOnOpen and cursorpyright.analysis.extraPaths entries are valid JSON and scoped to tooling only; no runtime impact.

packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py (1)

180-201: Retry loop now correctly treats categorized errors and unknowns as retriable

The new retry loop cleanly separates non‑retryable (fail fast) from retryable errors and applies exponential backoff, while unknown exceptions are funneled through _categorize_exception so they participate in the retry logic as intended.

packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client_test.py (3)

4-22: MockFlasherClient logger shape matches production usage

Extending MockLogger to accept *args, **kwargs and adding exception keeps tests decoupled from logging details while supporting the new logger.exception(...) call sites.


61-125: Categorization unit tests tightly cover primary behaviors

The tests for direct non‑retryable/retryable errors, unknown exception wrapping (including cause preservation), and non‑retryable precedence over retryable in a cause chain give good confidence that _categorize_exception behaves as designed.


127-201: Chain and ExceptionGroup tests validate robustness of traversal logic

The _find_exception_in_chain tests over simple cause chains, mock ExceptionGroups, nested groups, and aliasing (IOError/OSError) are thorough and ensure the new traversal logic is resilient to realistic exception structures.

@mangelajo
Copy link
Copy Markdown
Member Author

tested manually and seems to be working well

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client_test.py (1)

63-200: Excellent test coverage for exception categorization logic.

The test suite thoroughly exercises all paths through _categorize_exception and _find_exception_in_chain, including edge cases like nested ExceptionGroups and cause chain traversal.

Consider extracting the MockExceptionGroup class to module level to avoid duplication across test functions (Lines 158-161 and 176-179):

# Near top of file, after imports
class MockExceptionGroup(Exception):
    def __init__(self, message, exceptions):
        super().__init__(message)
        self.exceptions = exceptions

Then use it directly in both test functions without redefining it.

packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py (1)

210-248: Consider checking CancelledError in the entire exception chain.

The method correctly categorizes exceptions with proper priority handling. However, the CancelledError check at Line 238 only examines the top-level exception. If a CancelledError is wrapped (e.g., as a __cause__), it will be missed and treated as retryable instead of non-retryable.

For consistency with how FlashRetryableError and FlashNonRetryableError are detected, consider searching the cause chain for CancelledError:

     # Second pass: look for retryable errors
     retryable = self._find_exception_in_chain(exception, FlashRetryableError)
     if retryable is not None:
         return retryable
 
-    # CancelledError is a special case that should be treated as non-retryable
-    if isinstance(exception, CancelledError):
-        return FlashNonRetryableError("Operation cancelled")
+    # CancelledError is a special case that should be treated as non-retryable
+    cancelled = self._find_exception_in_chain(exception, CancelledError)
+    if cancelled is not None:
+        return FlashNonRetryableError("Operation cancelled")

Additionally, update the docstring to document the CancelledError special case:

     Priority:
     1. FlashNonRetryableError - highest priority, fail immediately
     2. FlashRetryableError - retry with backoff
-    3. Unknown exceptions - log full stack trace and treat as retryable
+    3. CancelledError - treat as non-retryable
+    4. Unknown exceptions - log full stack trace and treat as retryable
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6bced4a and 3b07bb8.

📒 Files selected for processing (2)
  • packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py (4 hunks)
  • packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client_test.py (3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: mangelajo
Repo: jumpstarter-dev/jumpstarter PR: 610
File: packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py:488-491
Timestamp: 2025-09-15T08:18:48.571Z
Learning: In the jumpstarter project, code review suggestions should stay focused on the specific scope of the PR. Suggestions about general improvements like timeout handling or error handling that are unrelated to the core changes being made should be avoided, even if they apply to modified code lines.
📚 Learning: 2025-11-05T13:45:58.271Z
Learnt from: mangelajo
Repo: jumpstarter-dev/jumpstarter PR: 735
File: packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py:15-15
Timestamp: 2025-11-05T13:45:58.271Z
Learning: In packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py, pexpect is intentionally used as a transitive dependency through the jumpstarter-driver-pyserial package. The flashers package does not declare pexpect as a direct dependency because the pyserial driver package is intended to control the pexpect version.

Applied to files:

  • packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client_test.py
  • packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py
📚 Learning: 2025-11-05T13:31:39.304Z
Learnt from: mangelajo
Repo: jumpstarter-dev/jumpstarter PR: 735
File: packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py:100-101
Timestamp: 2025-11-05T13:31:39.304Z
Learning: In packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py, the `flash()` method and its CLI command intentionally have different defaults for `fls_version`: the method defaults to `""` (empty string) while the CLI defaults to `"0.1.5"`. This is by design to provide CLI users with a convenient default while keeping the programmatic API ready for when FLS is included in the flasher images.

Applied to files:

  • packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py
🧬 Code graph analysis (1)
packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client_test.py (1)
packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py (4)
  • FlashNonRetryableError (38-39)
  • FlashRetryableError (34-35)
  • _categorize_exception (210-248)
  • _find_exception_in_chain (250-286)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: e2e
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
🔇 Additional comments (6)
packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client_test.py (2)

1-2: LGTM!

The import of CancelledError is necessary for testing the special handling of cancelled operations.


16-25: LGTM!

Updating the MockLogger to accept *args and **kwargs correctly aligns with real logger interfaces and supports the new exception() method used in the categorization logic.

packages/jumpstarter-driver-flashers/jumpstarter_driver_flashers/client.py (4)

8-8: LGTM!

The CancelledError import is correctly placed and necessary for treating cancelled operations as non-retryable.


181-202: LGTM!

The updated retry logic correctly:

  • Categorizes all exceptions through the centralized _categorize_exception method
  • Fails immediately for non-retryable errors with proper logging
  • Applies exponential backoff (2^attempt seconds) for retryable errors
  • Preserves exception context with from e for debugging

The approach of treating unknown exceptions as retryable (by wrapping) is sound and aligns with the PR objectives.


250-286: LGTM!

The _find_exception_in_chain method correctly implements recursive searching through:

  • ExceptionGroup members
  • The current exception
  • The cause chain (__cause__)
  • Nested ExceptionGroups within the cause chain

The implementation handles all edge cases for exception chain traversal.


1166-1166: LGTM!

The version bump to 0.1.9 for the FLS default is appropriate. The TODO comment provides useful context about the temporary nature of this default.

return result
current = getattr(current, '__cause__', None)
return None
def _find_exception_in_chain(self, exception: Exception, target_type: type) -> Exception | None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the leaf_exceptions helper for flattening exception groups.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait I missed this... ok I will look into reusing the other implementation, hopefully they are compatible enough.

Copy link
Copy Markdown
Member

@bkhizgiy bkhizgiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good

@mangelajo mangelajo merged commit 68125a7 into main Nov 17, 2025
19 checks passed
@mangelajo mangelajo deleted the fix-flash-retries branch November 17, 2025 11:32
@jumpstarter-backport-bot
Copy link
Copy Markdown

Successfully created backport PR for release-0.7:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants