Skip to content

feat: implement vision support for CustomOpenAIClient#2207

Open
optimusbuilder wants to merge 2 commits into
browserbase:mainfrom
optimusbuilder:feat/custom-openai-vision
Open

feat: implement vision support for CustomOpenAIClient#2207
optimusbuilder wants to merge 2 commits into
browserbase:mainfrom
optimusbuilder:feat/custom-openai-vision

Conversation

@optimusbuilder

@optimusbuilder optimusbuilder commented Jun 7, 2026

Copy link
Copy Markdown

why

CustomOpenAIClient had a TODO for vision support — when screenshots were passed in, it logged a warning and dropped them. That broke observe/extract vision flows for OpenAI-compatible providers like Ollama and LM Studio.

what changed

  • Append image_url user messages when options.image is provided (mirrors OpenAIClient)
  • Include optional image description as a text content part
  • Unit tests for image with/without description and no-image case

test plan

  • pnpm exec vitest run dist/esm/tests/unit/custom-openai-vision.test.js

Summary by cubic

Add vision support to CustomOpenAIClient by sending screenshots as image_url messages so screenshot-based flows work with OpenAI-compatible providers like Ollama and LM Studio. Also prevent message mutation so retries don’t duplicate images.

  • New Features

    • Append a user message with image_url when options.image is provided, with an optional text description.
  • Bug Fixes

    • Copy options.messages before attaching the image to avoid mutating prompts and duplicating images across retries; added tests for non-mutation and retry behavior.

Written for commit ea66151. Summary will update on new commits.

Review in cubic

CustomOpenAIClient ignored screenshot inputs and logged a warning instead
of forwarding them to OpenAI-compatible providers. Append image_url user
messages the same way OpenAIClient does so observe/extract vision flows
work with Ollama, LM Studio, and other compatible backends.

Co-authored-by: Cursor <cursoragent@cursor.com>
@changeset-bot

changeset-bot Bot commented Jun 7, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: ea66151

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server-v3 Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run.
Approving the latest commit mirrors it into an internal PR owned by the approver.
If new commits are pushed later, the internal PR stays open but is marked stale until someone approves the latest external commit and refreshes it.

@github-actions github-actions Bot added external-contributor Tracks PRs mirrored from external contributor forks. external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. labels Jun 7, 2026

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files

Confidence score: 3/5

  • There is a concrete regression risk in packages/core/lib/v3/external_clients/customOpenAI.ts: mutating options.messages in place can accumulate content across retries and produce duplicate image prompts.
  • Given the high severity/confidence (7/10, 10/10) and direct user-facing behavior impact, this sits above minor-risk changes and warrants caution before merge.
  • This is likely fixable in a focused way (e.g., avoiding shared mutable request state), so the PR is close but not quite low-risk yet.
  • Pay close attention to packages/core/lib/v3/external_clients/customOpenAI.ts - retry logic may reuse mutated messages and duplicate prompts.
Architecture diagram
sequenceDiagram
    participant App as Application (Stagehand)
    participant CLC as CustomOpenAIClient
    participant ChatAPI as OpenAI Chat Completions API
    participant Provider as OpenAI-compatible Provider (Ollama/LM Studio)

    Note over App,Provider: NEW: Vision Support in CustomOpenAIClient

    App->>CLC: createChatCompletion({ messages, image, ... })

    CLC->>CLC: Process options (strip image & requestId)

    CLC->>ChatAPI: POST /v1/chat/completions

    alt image is provided
        Note over CLC,ChatAPI: NEW: Build image message
        CLC->>CLC: Create image_url part from image.buffer (base64)
        opt image.description is provided
            CLC->>CLC: Append text part with description
        end
        CLC->>CLC: Push { role: "user", content: [image_url, text?...] } to messages
        ChatAPI->>Provider: Forward messages with image_url content
    else no image
        ChatAPI->>Provider: Forward messages as-is (no vision)
    end

    Provider-->>ChatAPI: Chat completion response
    ChatAPI-->>CLC: Raw response object
    CLC->>CLC: Parse usage and choices
    CLC-->>App: Parsed completion result

    Note over App,Provider: Error Handling
    alt Provider fails or returns error
        Provider-->>ChatAPI: Error response (e.g., 4xx/5xx)
        ChatAPI-->>CLC: Throws error
        alt retries > 0
            CLC->>CLC: Retry logic (unchanged)
        else no retries left
            CLC-->>App: Propagate error
        end
    end
Loading

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread packages/core/lib/v3/external_clients/customOpenAI.ts Outdated
Copy options.messages before appending the screenshot so schema-validation
retries reuse the original prompt instead of accumulating duplicate images.

Co-authored-by: Cursor <cursoragent@cursor.com>
@optimusbuilder

Copy link
Copy Markdown
Author

cursor review

@cursor

cursor Bot commented Jun 7, 2026

Copy link
Copy Markdown

Skipping Bugbot: Bugbot is disabled for this repository. Visit the Bugbot dashboard to update your settings.

@optimusbuilder

optimusbuilder commented Jun 7, 2026

Copy link
Copy Markdown
Author

Addressed in ea66151 — vision images are appended to a local messages copy, not options.messages. Added tests for immutability and retry behavior. The line 81 finding is stale; that line is now imageParts.push, not a mutation of shared request state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. external-contributor Tracks PRs mirrored from external contributor forks.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant