Skip to content

feat: structured error handling in Responses API streaming#4942

Open
iamemilio wants to merge 1 commit intollamastack:mainfrom
iamemilio:responses-streaming-errors
Open

feat: structured error handling in Responses API streaming#4942
iamemilio wants to merge 1 commit intollamastack:mainfrom
iamemilio:responses-streaming-errors

Conversation

@iamemilio
Copy link
Contributor

@iamemilio iamemilio commented Feb 17, 2026

What this does

When a streaming Responses API call fails mid-stream (e.g., the provider rejects an image or hits a rate limit), Llama Stack now returns spec-compliant error codes in the response.failed event instead of generic ones.

Before this PR

All streaming errors produced one of two hard-coded codes:

  • internal_error with str(exception) as the message — which leaked Python tracebacks and internal details to the client
  • invalid_request_error for unsupported truncation — a code that doesn't exist in the Responses API spec

This meant OpenAI-compatible clients couldn't programmatically distinguish between different failure modes (bad image, rate limit, server issue), and raw exception strings could leak implementation details.

After this PR

  • Provider errors are mapped to spec codes. When the upstream inference provider returns a structured error (e.g., OpenAI returns invalid_base64), we extract it and map it to the correct Responses API code (invalid_base64_image). Only codes defined in the spec are emitted; anything unrecognized falls back to server_error.
  • No more internal details leaked. Unexpected exceptions now return a generic server_error with a safe message instead of str(exc).
  • Truncation error uses a valid code. invalid_request_errorserver_error (which is an actual spec code).

User-facing impact

Clients using the OpenAI SDK or any spec-compliant streaming consumer will now receive meaningful, actionable error codes in response.failed events — e.g., invalid_base64_image instead of internal_error. This lets applications handle different failure modes appropriately (retry on server_error, show a user message on invalid_base64_image, back off on rate_limit_exceeded) without parsing error message strings.

Depends on

Test plan

  • 3 unit tests for extract_openai_error(): unknown codes fall back to server_error, valid codes pass through, all spec codes are recognized
  • 8 existing unit tests for error body parsing (nested, direct, missing fields, non-dict bodies, etc.)
  • 2 integration tests with recorded gpt and ollama responses:
    • truncation="auto" → validates response.failed event has a valid error code
    • Invalid base64 image input → validates provider BadRequestError is mapped to a spec-compliant code in the response.failed event
  • StreamingValidator enhancement: all integration tests now assert that response.failed error codes are within the spec-defined set
  • All pre-commit hooks pass (ruff, mypy, etc.)

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 17, 2026
@mergify
Copy link

mergify bot commented Feb 18, 2026

This pull request has merge conflicts that must be resolved before it can be merged. @iamemilio please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 18, 2026
@iamemilio iamemilio force-pushed the responses-streaming-errors branch from a6ff9fa to 0c83365 Compare February 25, 2026 18:55
@mergify mergify bot removed the needs-rebase label Feb 25, 2026
@iamemilio iamemilio force-pushed the responses-streaming-errors branch 2 times, most recently from 3e94d51 to f8cacf6 Compare February 25, 2026 19:10
- Add `_VALID_RESPONSE_ERROR_CODES` allowlist and validate error codes in
  `extract_openai_error()`, falling back to `server_error` for unmapped codes
- Map provider Chat Completions error codes to Responses API codes
  (e.g. `invalid_base64` -> `invalid_base64_image`)
- Use `server_error` instead of `invalid_request_error` for unsupported
  truncation mode
- Enhance `StreamingValidator` to assert error codes are spec-compliant
- Add integration tests with gpt and ollama recordings for streaming
  failures (truncation=auto, invalid base64 image)
- Add unit tests for error code extraction, mapping, and validation

Co-authored-by: Cursor <cursoragent@cursor.com>
@iamemilio iamemilio force-pushed the responses-streaming-errors branch from f8cacf6 to 672ffa6 Compare February 25, 2026 19:24
@iamemilio iamemilio marked this pull request as ready for review February 25, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant