Skip to content

Release v1.5.0 — Agentic Workflow Designer + cross-model compatibility#19

Merged
DoyleDev merged 9 commits into
mainfrom
fix/cross-model-compat
Jun 11, 2026
Merged

Release v1.5.0 — Agentic Workflow Designer + cross-model compatibility#19
DoyleDev merged 9 commits into
mainfrom
fix/cross-model-compat

Conversation

@DoyleDev

Copy link
Copy Markdown
Collaborator

Lands everything for the v1.5.0 release onto main. PR #17 (cross-model compat) was closed unmerged and PR #18 (designer) merged into this branch instead of main, so this single PR carries both onto the default branch.

Contents

Agentic Workflow Designer (was #18)

  • Drag-and-drop canvas: cells (model + tool subset + prompt) wired with flow/feedback edges
  • Gate cells route via a forced route_output verdict; bounded feedback loops (editable per-cell revision cap, 25-run global budget)
  • Per-cell tool selection, live transcript drawer, workflows saved to ~/.mason/workflows/
  • Built-in Spec → Implement → Test → Review template
  • README section + docs/designer.png

Cross-model compatibility (was #17)

  • Gate stream_options per family (fixes Qwen 3.5 / Llama 4 400 unknown field)
  • Collapse multiple system messages (fixes Gemini single-system error)
  • Per-family max_tokens caps
  • Flatten array-shaped delta.content (fixes [object Object] on Qwen 3.5 / Gemini / gpt-oss)
  • npm run test:models cross-model sweep (81 scenarios)

Release

  • chore(release): 1.5.0 — package.json + manifest + lockfile

After merge

  • Re-point the draft v1.5.0 release tag from feat/workflow-designer to the main merge commit, then publish.

DoyleDev and others added 9 commits June 9, 2026 15:53
Three model-family bugs were live before this:

  • Qwen/Llama/gpt-oss 400 with "unknown field stream_options".
    Mason added stream_options.include_usage on every streaming
    request to surface Anthropic cache stats, but those providers
    reject the field.

  • Gemini 400 with "Gemini models only support one system prompt".
    Mason builds up to three system messages (skills manifest +
    user prompt + tool-aware nudge); Gemini permits exactly one.
    Other models tolerated multiple so we never noticed.

  • Llama / Qwen 3 next 400 with "max_tokens cannot exceed N".
    The 16384 cap was for Opus extended thinking; Llama caps at
    8192 and Qwen 3 next at 10000.

Fixes:

  • src/chat-shared.ts (new): extracted flattenContent,
    applyAnthropicCaching, plus three new helpers — supportsStreamOptions
    (allowlist Claude / GPT / Codex / o-series; explicitly exclude
    gpt-oss), maxTokensFor (16384 for Claude, 10000 for Qwen 3 next,
    8192 default), consolidateSystemMessages (collapse multiple
    role:"system" into one with \n\n separator; universally
    compatible and preserves cache_control behavior).
  • src/main.ts: import from chat-shared, gate body.stream_options
    on supportsStreamOptions, run consolidateSystemMessages before
    applyAnthropicCaching, and use maxTokensFor for both chat-
    completions branches.

Regression coverage:

  • scripts/test-models.js (new): standalone Node sweep that mints
    OAuth via the local databricks CLI, discovers every chat model
    via /api/2.0/serving-endpoints, and runs three scenarios
    (hello-no-tools, hello-with-tools, multi-system) against each.
    Mirrors Mason's chat-handler logic by reusing the same helpers
    from build/ts/chat-shared.js. Mirrors the chatLoop "promote to
    responses when tools + responses-supported" rule so gpt-5-5
    tool tests are skipped (Mason never sends them via chat
    completions).
  • npm run test:models — runs the sweep. --filter <substr> scopes
    to a model subset; --profile <name> picks a non-DEFAULT
    databrickscfg profile.

End-to-end result: 81/81 model+scenario combinations green on the
user's workspace (Claude, GPT, GPT-OSS, Gemini, Llama 3/4, Qwen 3
next + 3.5, Gemma).

Co-authored-by: Isaac
Co-authored-by: Isaac
Qwen 3.5 122B, Gemini 2.5, and gpt-oss stream delta.content as an array
of content parts; appending it to a string yielded literal
"[object Object]" in the chat window. Coerce through flattenContent
before accumulating/emitting. Sweep now parses SSE deltas the same way
and fails on "[object Object]" in assembled output.

Co-authored-by: Isaac
…r.ts

resolveModelRouting (gateway/format resolution incl. tools->Responses
promotion), executeToolCore (headless load_skill/builtin/MCP dispatch),
and capToolResult move to src/agent-runner.ts so the upcoming workflow
engine can reuse them. getAllToolDefs gains an optional allowlist param
(narrowing only) for per-cell tool selection; renderQuestionCard gains
an optional container. Chat behavior unchanged.

Co-authored-by: Isaac
Visual node-based workflow designer (sidebar button above Profile,
Cmd+D). Cells select a model + per-cell tool subset + prompt; flow
edges pipe outputs downstream with labeled multi-input joins; dashed
feedback edges create bounded revision loops driven by a forced
route_output verdict tool on gate cells. Sequential execution through
the existing chat IPC; budgets (40 inner / 5 feedback / 25 global)
guard against runaway loops. Workflows persist to ~/.mason/workflows.
Includes the Spec → Implement → Test → Review template.

Co-authored-by: Isaac
…e pressure

Loops were already bounded (5 revisions per cell, 25 cell runs global,
gatekeeper forces a terminal route on exhaustion) but invisibly so.
Now: feedback-target cells show an editable revision cap (1-20) on the
card; status pills show loop progress (running 3/5); gates see
'revision N of M' annotations on revised inputs plus per-route budget
state and an explicit don't-chase-perfection instruction. Verified
against a critic prompted to never be satisfied: forced to route end
after the cap, 4 cell runs.

Co-authored-by: Isaac
Co-authored-by: Isaac
@DoyleDev DoyleDev merged commit 088bbb9 into main Jun 11, 2026
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant