feat: implement vision support for CustomOpenAIClient#2207
feat: implement vision support for CustomOpenAIClient#2207optimusbuilder wants to merge 2 commits into
Conversation
CustomOpenAIClient ignored screenshot inputs and logged a warning instead of forwarding them to OpenAI-compatible providers. Append image_url user messages the same way OpenAIClient does so observe/extract vision flows work with Ollama, LM Studio, and other compatible backends. Co-authored-by: Cursor <cursoragent@cursor.com>
🦋 Changeset detectedLatest commit: ea66151 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run. |
There was a problem hiding this comment.
1 issue found across 3 files
Confidence score: 3/5
- There is a concrete regression risk in
packages/core/lib/v3/external_clients/customOpenAI.ts: mutatingoptions.messagesin place can accumulate content across retries and produce duplicate image prompts. - Given the high severity/confidence (7/10, 10/10) and direct user-facing behavior impact, this sits above minor-risk changes and warrants caution before merge.
- This is likely fixable in a focused way (e.g., avoiding shared mutable request state), so the PR is close but not quite low-risk yet.
- Pay close attention to
packages/core/lib/v3/external_clients/customOpenAI.ts- retry logic may reuse mutated messages and duplicate prompts.
Architecture diagram
sequenceDiagram
participant App as Application (Stagehand)
participant CLC as CustomOpenAIClient
participant ChatAPI as OpenAI Chat Completions API
participant Provider as OpenAI-compatible Provider (Ollama/LM Studio)
Note over App,Provider: NEW: Vision Support in CustomOpenAIClient
App->>CLC: createChatCompletion({ messages, image, ... })
CLC->>CLC: Process options (strip image & requestId)
CLC->>ChatAPI: POST /v1/chat/completions
alt image is provided
Note over CLC,ChatAPI: NEW: Build image message
CLC->>CLC: Create image_url part from image.buffer (base64)
opt image.description is provided
CLC->>CLC: Append text part with description
end
CLC->>CLC: Push { role: "user", content: [image_url, text?...] } to messages
ChatAPI->>Provider: Forward messages with image_url content
else no image
ChatAPI->>Provider: Forward messages as-is (no vision)
end
Provider-->>ChatAPI: Chat completion response
ChatAPI-->>CLC: Raw response object
CLC->>CLC: Parse usage and choices
CLC-->>App: Parsed completion result
Note over App,Provider: Error Handling
alt Provider fails or returns error
Provider-->>ChatAPI: Error response (e.g., 4xx/5xx)
ChatAPI-->>CLC: Throws error
alt retries > 0
CLC->>CLC: Retry logic (unchanged)
else no retries left
CLC-->>App: Propagate error
end
end
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
Copy options.messages before appending the screenshot so schema-validation retries reuse the original prompt instead of accumulating duplicate images. Co-authored-by: Cursor <cursoragent@cursor.com>
|
cursor review |
|
Skipping Bugbot: Bugbot is disabled for this repository. Visit the Bugbot dashboard to update your settings. |
|
Addressed in ea66151 — vision images are appended to a local messages copy, not options.messages. Added tests for immutability and retry behavior. The line 81 finding is stale; that line is now imageParts.push, not a mutation of shared request state. |
why
CustomOpenAIClient had a TODO for vision support — when screenshots were passed in, it logged a warning and dropped them. That broke observe/extract vision flows for OpenAI-compatible providers like Ollama and LM Studio.
what changed
test plan
Summary by cubic
Add vision support to
CustomOpenAIClientby sending screenshots asimage_urlmessages so screenshot-based flows work with OpenAI-compatible providers like Ollama and LM Studio. Also prevent message mutation so retries don’t duplicate images.New Features
image_urlwhenoptions.imageis provided, with an optional text description.Bug Fixes
options.messagesbefore attaching the image to avoid mutating prompts and duplicating images across retries; added tests for non-mutation and retry behavior.Written for commit ea66151. Summary will update on new commits.