feat: structured error handling in Responses API streaming#4942
Open
iamemilio wants to merge 1 commit intollamastack:mainfrom
Open
feat: structured error handling in Responses API streaming#4942iamemilio wants to merge 1 commit intollamastack:mainfrom
iamemilio wants to merge 1 commit intollamastack:mainfrom
Conversation
|
This pull request has merge conflicts that must be resolved before it can be merged. @iamemilio please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
a6ff9fa to
0c83365
Compare
3e94d51 to
f8cacf6
Compare
- Add `_VALID_RESPONSE_ERROR_CODES` allowlist and validate error codes in `extract_openai_error()`, falling back to `server_error` for unmapped codes - Map provider Chat Completions error codes to Responses API codes (e.g. `invalid_base64` -> `invalid_base64_image`) - Use `server_error` instead of `invalid_request_error` for unsupported truncation mode - Enhance `StreamingValidator` to assert error codes are spec-compliant - Add integration tests with gpt and ollama recordings for streaming failures (truncation=auto, invalid base64 image) - Add unit tests for error code extraction, mapping, and validation Co-authored-by: Cursor <cursoragent@cursor.com>
f8cacf6 to
672ffa6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does
When a streaming Responses API call fails mid-stream (e.g., the provider rejects an image or hits a rate limit), Llama Stack now returns spec-compliant error codes in the
response.failedevent instead of generic ones.Before this PR
All streaming errors produced one of two hard-coded codes:
internal_errorwithstr(exception)as the message — which leaked Python tracebacks and internal details to the clientinvalid_request_errorfor unsupported truncation — a code that doesn't exist in the Responses API specThis meant OpenAI-compatible clients couldn't programmatically distinguish between different failure modes (bad image, rate limit, server issue), and raw exception strings could leak implementation details.
After this PR
invalid_base64), we extract it and map it to the correct Responses API code (invalid_base64_image). Only codes defined in the spec are emitted; anything unrecognized falls back toserver_error.server_errorwith a safe message instead ofstr(exc).invalid_request_error→server_error(which is an actual spec code).User-facing impact
Clients using the OpenAI SDK or any spec-compliant streaming consumer will now receive meaningful, actionable error codes in
response.failedevents — e.g.,invalid_base64_imageinstead ofinternal_error. This lets applications handle different failure modes appropriately (retry onserver_error, show a user message oninvalid_base64_image, back off onrate_limit_exceeded) without parsing error message strings.Depends on
Test plan
extract_openai_error(): unknown codes fall back toserver_error, valid codes pass through, all spec codes are recognizedtruncation="auto"→ validatesresponse.failedevent has a valid error codeBadRequestErroris mapped to a spec-compliant code in theresponse.failedeventStreamingValidatorenhancement: all integration tests now assert thatresponse.failederror codes are within the spec-defined set