Skip to content

core(cua): Google function-response image handling hardcodes PNG mimeType #2046

@BABTUNA

Description

@BABTUNA

Why

GoogleCUAClient currently hardcodes mimeType: "image/png" and strips only a PNG data URL prefix when building function-response image parts.

If screenshot input is JPEG (or any non-PNG data URL), we send mismatched metadata and may pass the full data URL instead of raw base64 payload.

Current behavior

In packages/core/lib/v3/agent/GoogleCUAClient.ts:

  • function responses always set inlineData.mimeType to image/png
  • payload extraction strips only ^data:image/png;base64,
  • captureScreenshot() normalizes raw provider input to PNG data URLs

Proposed change

  • Parse MIME + base64 payload from screenshot data URLs
  • Use parsed MIME in function-response inlineData.mimeType
  • Fallback to image/png for raw/non-image data URLs to preserve compatibility
  • Add unit tests covering JPEG parsing and PNG fallback

Suggested files

  • packages/core/lib/v3/agent/GoogleCUAClient.ts
  • packages/core/tests/unit/google-cua-client.test.ts

Acceptance criteria

  • function-response inlineData.mimeType matches screenshot data URL MIME (for image data URLs)
  • base64 payload excludes data URL prefix for PNG/JPEG
  • existing raw base64 behavior still works via PNG fallback
  • unit tests cover MIME parsing and fallback

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions