Epic: Pluggable model-agnostic agent runners (anthropic | pi)

# Epic: `pi` agent runner — pi.dev coding agent behind the `wizard-runner` flag

## Summary

Add a second agent backend, **`pi`**, built on [pi.dev](https://pi.dev)
(`@earendil-works/pi-coding-agent`) — a Claude-Code-style coding agent —
selectable at runtime via a multivariate `wizard-runner` flag (`anthropic` | `pi`,
default `anthropic`). This de-risks the wizard's dependency on a single agent loop
(`@anthropic-ai/claude-agent-sdk`): an interchangeable coding-agent backend runs
against the same PostHog LLM gateway, the same skills, and the same TUI, so the two
can be A/B'd in production.

## Why Pi

- **It's a coding agent** — it owns the tool-calling loop and ships its own
  Read/Write/Edit/Bash tools, so we wrap it rather than rebuild a loop.
- **Native file-based skills** — `SKILL.md` + frontmatter, progressive disclosure,
  discovered from scanned skill directories. This is the context-mill skill model,
  so a framework skill loads by placing it in a directory Pi scans.
- **Speaks our gateway directly** — `registerProvider({ baseUrl, apiKey, api:
  'anthropic-messages', headers })` matches the PostHog gateway transport (Anthropic
  Messages protocol, bearer auth, Bedrock-fallback + metadata/flag headers).
- **In-process SDK** — `createAgentSession` → `session.prompt(text)` →
  `session.subscribe(event)`; custom tools via `defineTool`.

## Architecture

Two orthogonal axes:

| Axis | Flag | Forks |
|---|---|---|
| Mode | `wizard-orchestrator` | linear vs orchestrator |
| Backend | `wizard-runner` | `anthropic` vs `pi` |

Backend selection lives inside the linear mode. A backend is a drop-in for
`runLinearProgram` — `AgentBackend.run(session, config: ProgramRun, programConfig,
boot: BootstrapResult)` — and `selectBackend(flags)` resolves `wizard-runner` to one
(unknown/missing → `anthropic`). `anthropic` is the control (the claude-agent-sdk
path).

The `pi` backend:
- Registers the PostHog gateway as an `anthropic-messages` provider (`baseUrl` from
  `getLlmGatewayUrlFromHost(host)`, `apiKey` = posthog token, `headers` =
  Bedrock-fallback + wizard metadata/flags). Model id matches the `anthropic`
  backend for a clean A/B.
- System prompt = wizard commandments (`DefaultResourceLoader` system-prompt
  override).
- Skills: installs the resolved framework skill into a Pi-scanned skill dir so Pi
  discovers it.
- Wizard capabilities (env-file ops, `wizard_ask`, PostHog data ops) as Pi tools.
- Streams `session.subscribe` events onto the shared stream→TUI bridge; maps errors
  to `AgentErrorType`.
- Fail-closed security (canUseTool allowlist + YARA) on tool execution, matching the
  `anthropic` path.

## Sub-issues

- **#521 — Runner seam + multivariate `wizard-runner` flag** — `AgentBackend` +
  `selectBackend`, wired into `linear.ts`; `anthropic` control. *(PR #692)*
- **#522 — Shared runner contract + adapters** — gateway, MCP, prompt, tools,
  stream→TUI. *(landed across #692–#694)*
- **#523 — Vercel AI SDK runner (`vercel`)** — *dropped; closed in favor of `pi`.*
- **#524 — pi.dev runner (`pi`)** — `createAgentSession` on the gateway, wizard
  tools as pi custom tools, stream→TUI, error mapping. *(PRs #693, #694)*
- **#525 — Security parity (canUseTool + YARA, fail-closed)** — *(PR #697)*
- **#526 — Subagent / Task dispatch parity** — Task/todo + controlled subagents.
  *(PR #698)*
- **#527 — Rollout + per-variant observability (canary split)** — *(pending)*
- **#528 — Cross-runner parity test suite** — *(pending)*
- **#529 — Decide, ramp, and decouple** — *(pending)*
- **(new) — pi perf parity** — native ls/find/grep steering, skill-menu cache,
  batch tool calls, 1M context. *(PR #699; covers the perf work that landed
  alongside #524)*
- **(new) — pi PostHog data-ops via real MCP** — dashboard/insight tools over the
  hosted MCP + scrubbed-env lockdown. *(PR #701)*

## Open questions (resolve in early sub-issues)

- Does Pi mount MCP servers, or are PostHog data ops better expressed as custom
  tools?
- Does Pi's provider append `/v1/messages` to `baseUrl`, or is the full path needed?
- How does Pi resolve a custom-provider model without interactive auth in
  headless/CI runs?
- How is per-tool gating exposed for the security layer (canUseTool + YARA)?

## Local testing (mprocs)

Running a non-`anthropic` backend locally has three requirements the harness must
satisfy:

- **Build channel.** Flag overrides and the active runner are only meaningful in a
  non-production build, but local runs need real prod-cloud OAuth/hosts. The CI
  build channel (`WIZARD_BUILD_NODE_ENV=ci`) satisfies both — it keeps
  `WIZARD_CI_FLAG_OVERRIDES` live and targets prod cloud. A `wizard-build-ci` mproc
  builds it.
- **Backend selection.** Select the backend with
  `WIZARD_CI_FLAG_OVERRIDES='{"wizard-runner":"pi"}'` (or `anthropic`) in the
  workbench `.env` — a flag, not a code change — so both backends run from one build.
- **Dependencies & observability.** The local MCP (`localhost:8787`) must be running
  for the full PostHog toolset; the backend degrades when it isn't. Runner
  diagnostics go to the log file (the TUI owns the console), and each run dumps its
  context (system prompt, tools, skills, model) for side-by-side comparison.

Canonical flow:
```
# wizard-workbench/.env
WIZARD_PATH=/path/to/wizard
WIZARD_CI_FLAG_OVERRIDES={"wizard-runner":"pi"}
# mprocs: wizard-build-ci → mcp (localhost:8787) → wizard-run
```

## Acceptance

- `pi` runs a real integration end-to-end through the gateway, behind
  `wizard-runner=pi`, default-off; `anthropic` unchanged.
- Skills load via Pi's native discovery (same framework skill as `anthropic`).
- Security parity (canUseTool + YARA) before any non-zero ramp.
- A documented, repeatable mprocs test flow.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: Pluggable model-agnostic agent runners (anthropic | pi) #520

Epic: `pi` agent runner — pi.dev coding agent behind the `wizard-runner` flag

Summary

Why Pi

Architecture

Sub-issues

Open questions (resolve in early sub-issues)

Local testing (mprocs)

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Axis	Flag	Forks
Mode	`wizard-orchestrator`	linear vs orchestrator
Backend	`wizard-runner`	`anthropic` vs `pi`

Epic: Pluggable model-agnostic agent runners (anthropic | pi) #520

Description

Epic: pi agent runner — pi.dev coding agent behind the wizard-runner flag

Summary

Why Pi

Architecture

Sub-issues

Open questions (resolve in early sub-issues)

Local testing (mprocs)

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Epic: `pi` agent runner — pi.dev coding agent behind the `wizard-runner` flag