fix(helpers): honor x-should-retry header in runner-helper retry classifier#1692
Open
WalkingDreams798 wants to merge 1 commit into
Open
Conversation
…sifier `is_fatal_status_error` (used by the environments poller, session tool runner, and worker heartbeat to decide whether a failure is worth retrying) classified errors purely by status code and ignored the server's `x-should-retry` response header — even though its docstring claims it "aligns with the core client's _should_retry policy". The core client (`_base_client._should_retry`) honors the header first. This caused two divergences from the core client: - a 4xx carrying `x-should-retry: true` was treated as fatal (the runner stops) when the server explicitly asked to retry; - a 429 (or 5xx) carrying `x-should-retry: false` was retried when the server explicitly asked not to. Honor the header first (`true` -> not fatal, `false` -> fatal, regardless of status), then fall back to the existing 4xx code logic. Added unit tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
is_fatal_status_errorinlib/_retry.py— used by the environments poller, the session tool runner, and the worker heartbeat to decide whether a failure is worth retrying — classified errors purely by status code and ignored the server'sx-should-retryresponse header. Its docstring nonetheless claims it "Aligns with the core client's_should_retrypolicy", and the core client (_base_client._should_retry) honors that header first.Impact
Two divergences from the core client's retry behavior:
x-should-retry: true→ treated as fatal, so the runner stops, even though the server explicitly asked to retry;x-should-retry: false→ retried, even though the server explicitly asked not to.Repro
Change
Honor
x-should-retryfirst —trueis never fatal,falseis always fatal, regardless of status code — then fall back to the existing 4xx-code logic. This makes the classifier match the documented intent and the core client.Tests
Added
tests/lib/test_retry.py: plain 4xx fatal, transient 4xx (408/409/429) not fatal, 5xx not fatal,x-should-retryoverride in both directions, and non-status errors (transport/connection) not fatal.Verification
uv run pytest tests/lib/test_retry.py→ 15 passeduv run ruff check→ all checks passeduv run pyright(strict) → 0 errors, 0 warnings