You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Epic: pi agent runner — pi.dev coding agent behind the wizard-runner flag
Summary
Add a second agent backend, pi, built on pi.dev
(@earendil-works/pi-coding-agent) — a Claude-Code-style coding agent —
selectable at runtime via a multivariate wizard-runner flag (anthropic | pi,
default anthropic). This de-risks the wizard's dependency on a single agent loop
(@anthropic-ai/claude-agent-sdk): an interchangeable coding-agent backend runs
against the same PostHog LLM gateway, the same skills, and the same TUI, so the two
can be A/B'd in production.
Why Pi
It's a coding agent — it owns the tool-calling loop and ships its own
Read/Write/Edit/Bash tools, so we wrap it rather than rebuild a loop.
Native file-based skills — SKILL.md + frontmatter, progressive disclosure,
discovered from scanned skill directories. This is the context-mill skill model,
so a framework skill loads by placing it in a directory Pi scans.
Backend selection lives inside the linear mode. A backend is a drop-in for runLinearProgram — AgentBackend.run(session, config: ProgramRun, programConfig, boot: BootstrapResult) — and selectBackend(flags) resolves wizard-runner to one
(unknown/missing → anthropic). anthropic is the control (the claude-agent-sdk
path).
The pi backend:
Registers the PostHog gateway as an anthropic-messages provider (baseUrl from getLlmGatewayUrlFromHost(host), apiKey = posthog token, headers =
Bedrock-fallback + wizard metadata/flags). Model id matches the anthropic
backend for a clean A/B.
System prompt = wizard commandments (DefaultResourceLoader system-prompt
override).
Skills: installs the resolved framework skill into a Pi-scanned skill dir so Pi
discovers it.
Wizard capabilities (env-file ops, wizard_ask, PostHog data ops) as Pi tools.
Streams session.subscribe events onto the shared stream→TUI bridge; maps errors
to AgentErrorType.
Fail-closed security (canUseTool allowlist + YARA) on tool execution, matching the anthropic path.
Does Pi mount MCP servers, or are PostHog data ops better expressed as custom
tools?
Does Pi's provider append /v1/messages to baseUrl, or is the full path needed?
How does Pi resolve a custom-provider model without interactive auth in
headless/CI runs?
How is per-tool gating exposed for the security layer (canUseTool + YARA)?
Local testing (mprocs)
Running a non-anthropic backend locally has three requirements the harness must
satisfy:
Build channel. Flag overrides and the active runner are only meaningful in a
non-production build, but local runs need real prod-cloud OAuth/hosts. The CI
build channel (WIZARD_BUILD_NODE_ENV=ci) satisfies both — it keeps WIZARD_CI_FLAG_OVERRIDES live and targets prod cloud. A wizard-build-ci mproc
builds it.
Backend selection. Select the backend with WIZARD_CI_FLAG_OVERRIDES='{"wizard-runner":"pi"}' (or anthropic) in the
workbench .env — a flag, not a code change — so both backends run from one build.
Dependencies & observability. The local MCP (localhost:8787) must be running
for the full PostHog toolset; the backend degrades when it isn't. Runner
diagnostics go to the log file (the TUI owns the console), and each run dumps its
context (system prompt, tools, skills, model) for side-by-side comparison.
Epic:
piagent runner — pi.dev coding agent behind thewizard-runnerflagSummary
Add a second agent backend,
pi, built on pi.dev(
@earendil-works/pi-coding-agent) — a Claude-Code-style coding agent —selectable at runtime via a multivariate
wizard-runnerflag (anthropic|pi,default
anthropic). This de-risks the wizard's dependency on a single agent loop(
@anthropic-ai/claude-agent-sdk): an interchangeable coding-agent backend runsagainst the same PostHog LLM gateway, the same skills, and the same TUI, so the two
can be A/B'd in production.
Why Pi
Read/Write/Edit/Bash tools, so we wrap it rather than rebuild a loop.
SKILL.md+ frontmatter, progressive disclosure,discovered from scanned skill directories. This is the context-mill skill model,
so a framework skill loads by placing it in a directory Pi scans.
registerProvider({ baseUrl, apiKey, api: 'anthropic-messages', headers })matches the PostHog gateway transport (AnthropicMessages protocol, bearer auth, Bedrock-fallback + metadata/flag headers).
createAgentSession→session.prompt(text)→session.subscribe(event); custom tools viadefineTool.Architecture
Two orthogonal axes:
wizard-orchestratorwizard-runneranthropicvspiBackend selection lives inside the linear mode. A backend is a drop-in for
runLinearProgram—AgentBackend.run(session, config: ProgramRun, programConfig, boot: BootstrapResult)— andselectBackend(flags)resolveswizard-runnerto one(unknown/missing →
anthropic).anthropicis the control (the claude-agent-sdkpath).
The
pibackend:anthropic-messagesprovider (baseUrlfromgetLlmGatewayUrlFromHost(host),apiKey= posthog token,headers=Bedrock-fallback + wizard metadata/flags). Model id matches the
anthropicbackend for a clean A/B.
DefaultResourceLoadersystem-promptoverride).
discovers it.
wizard_ask, PostHog data ops) as Pi tools.session.subscribeevents onto the shared stream→TUI bridge; maps errorsto
AgentErrorType.anthropicpath.Sub-issues
wizard-runnerflag (anthropic | pi) #521 — Runner seam + multivariatewizard-runnerflag —AgentBackend+selectBackend, wired intolinear.ts;anthropiccontrol. (PR feat(runner): agent-backend seam + multivariate wizard-runner flag (#521) #692)stream→TUI. (landed across feat(runner): agent-backend seam + multivariate wizard-runner flag (#521) #692–feat(runner): wizard tools as pi custom tools — real integration on pi (#524) #694)
vercel) #523 — Vercel AI SDK runner (vercel) — dropped; closed in favor ofpi.pi) #524 — pi.dev runner (pi) —createAgentSessionon the gateway, wizardtools as pi custom tools, stream→TUI, error mapping. (PRs feat(runner): pi.dev backend behind wizard-runner=pi (#524) #693, feat(runner): wizard tools as pi custom tools — real integration on pi (#524) #694)
pi#525 — Security parity (canUseTool + YARA, fail-closed) — (PR feat(runner): fail-closed security parity on the pi backend (#525) #697)pi#526 — Subagent / Task dispatch parity — Task/todo + controlled subagents.(PR feat(runner): Task/todo + controlled subagents + logging parity on pi (#526) #698)
batch tool calls, 1M context. (PR perf(pi): steer to native tools + cache skill menu #699; covers the perf work that landed
alongside 04 — pi.dev runner (
pi) #524)hosted MCP + scrubbed-env lockdown. (PR feat(pi): real PostHog MCP dashboard, env lockdown, perf parity #701)
Open questions (resolve in early sub-issues)
tools?
/v1/messagestobaseUrl, or is the full path needed?headless/CI runs?
Local testing (mprocs)
Running a non-
anthropicbackend locally has three requirements the harness mustsatisfy:
non-production build, but local runs need real prod-cloud OAuth/hosts. The CI
build channel (
WIZARD_BUILD_NODE_ENV=ci) satisfies both — it keepsWIZARD_CI_FLAG_OVERRIDESlive and targets prod cloud. Awizard-build-cimprocbuilds it.
WIZARD_CI_FLAG_OVERRIDES='{"wizard-runner":"pi"}'(oranthropic) in theworkbench
.env— a flag, not a code change — so both backends run from one build.localhost:8787) must be runningfor the full PostHog toolset; the backend degrades when it isn't. Runner
diagnostics go to the log file (the TUI owns the console), and each run dumps its
context (system prompt, tools, skills, model) for side-by-side comparison.
Canonical flow:
Acceptance
piruns a real integration end-to-end through the gateway, behindwizard-runner=pi, default-off;anthropicunchanged.anthropic).