darkroomengineering · arzafran · Jun 10, 2026 · Jun 10, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,14 @@ All notable changes to cc-settings are documented here.
 
 ## [Unreleased]
 
+### Delegation guidance — high-agency heuristic (issue #44)
+
+- **`CLAUDE-FULL.md`**: replaced the "repeated nag" MUST/SHOULD list with a per-decision heuristic table (3+ files / 10+ calls / security-sensitive → delegate, route by shape; NO → act directly). Four closing rules replace the old enforcement list; the briefing-contract blockquote is preserved verbatim.
+- **`src/hooks/tool-cadence.ts` (parallelmax branch)**: one nudge per streak (fires at 12+ calls OR 3+ distinct files edited, whichever comes first), followed by one escalation (soft block via `continueOnBlock`) if the streak continues past the reminder. Net: at most 2 signals per streak, down from one every 12 calls indefinitely. State tracks `files`, `nudged`, `countAtNudge`, `filesAtNudge`, `escalated`; old-shape state files handled with defensive defaults.
+- **`config/40-hooks.json`**: `continueOnBlock: true` on the tool-cadence PostToolUse hook so the escalation block surfaces as a hard-to-ignore signal without aborting the turn.
+- **`src/hooks/delegation-detector.ts`**: message compressed — single-line format with score, matched signals, and routing guide; overriding requires a stated reason.
+
+
 ### Cost tuning — "explore/execute cheap, decide on Fable"
 
 Fable stays the session default and the tier for judgment agents, but the high-volume read/execute agents move off it, since they were the bulk of the burn (each subagent re-reads the repo, and on a Fable session the inheriting agents all ran Fable).

diff --git a/CLAUDE-FULL.md b/CLAUDE-FULL.md
@@ -17,42 +17,36 @@ The Edit tool uses exact string matching. Follow these rules:
 
 ## Delegation
 
-> **Opus 4.8 note**: Claude Opus 4.8 (and 4.7 before it) spawns fewer subagents by default than 4.6 and prefers internal reasoning over tool/agent use. The rules below are **not suggestions** — they are explicit triggers to counter that bias. When a trigger fires, delegate. Do not reason your way out of delegating "because you could do it yourself."
+> **Opus 4.8 note**: Opus 4.8 (and 4.7) under-delegates — it prefers internal reasoning over spawning agents. The heuristic below is calibrated to counter that bias. Do not reason your way out of it "because you could do it yourself."
 
-### You MUST delegate (non-negotiable) when:
+### The per-decision heuristic
 
-- **Multi-file exploration spanning 3+ files** → `Agent(explore, "...")`
-- **Any task that would require 10+ sequential tool calls** → break into agent tasks
-- **Security-sensitive code** (auth, payments, crypto, input validation) → `Agent(security-reviewer, "...")`
-- **Writing new test files** → `Agent(tester, "...")`
-- **Dead code cleanup or codebase deslop** → `Agent(deslopper, "...")`
-- **Parallel independent workstreams** (3+ with no file conflicts) → spawn agents in a single message
+Before each unit of work, ask once: **3+ files, 10+ tool calls, or security-sensitive code?**
 
-### You SHOULD prefer delegation for:
+**YES → delegate first, then route by shape:**
 
-- **Genuinely complex implementation — 3+ files, or 10+ sequential tool calls** → `Agent(implementer, "...")`
-- **Architecture decisions or upfront planning** → `Agent(planner, "...")`
-- **Scaffolding new components/hooks/pages** → `Agent(scaffolder, "...")`
-- **Code review on changes touching 3+ files** → `Agent(reviewer, "...")`
-- **Expert second opinions / blast-radius / "why is this here" questions** → `Agent(explore, "...")`
-- **Full-feature orchestration across 3+ agents** → `Agent(maestro, "...")`
-- **Tasks structurally prone to single-window failure** — agentic laziness (quitting at 20 of 50 items), self-preferential bias (judging your own output), goal drift across compaction → a [dynamic workflow](https://code.claude.com/docs/en/workflows) / `/effort ultracode` (see `skills/orchestrate/SKILL.md`)
+| Shape | Agent |
+|---|---|
+| understand / find / map / blast-radius | `explore` |
+| build / change / fix across files | `implementer` |
+| plan / architecture | `planner` |
+| new test files | `tester` (MUST) |
+| auth, payments, crypto, input validation | `security-reviewer` (MUST) |
+| dead code / deslop | `deslopper` (MUST) |
+| 3+ independent workstreams | parallel `Agent` calls in ONE message (MUST) |
+| full feature spanning 3+ agents | `maestro` |
+| prone to single-window failure — agentic laziness (quitting at 20 of 50 items), self-preferential bias (judging your own output), goal drift across compaction | a [dynamic workflow](https://code.claude.com/docs/en/workflows) / `/effort ultracode` (see `skills/orchestrate/SKILL.md`) |
 
-> **Briefing contract for `implementer`**: as a subagent it gets only your prompt — no conversation context, none of the files you've read — so every prompt MUST contain actual content, not references: the user's ask verbatim, exact file paths and line ranges, the change to make (paste the planner output; never write "based on findings" or "according to plan"), the verification command, and a scope boundary. Thin prompts cause regressions; the agent will refuse them. It runs in the live working tree and leaves changes **uncommitted** for you to review before they land. Full contract: `agents/implementer.md` REQUIRED BRIEFING. This applies equally to `explore` → `implementer` and `planner` → `implementer` chains.
-
-### Act directly ONLY when:
+**NO → act directly.** 1–2 file edits, known-path reads, single greps/globs, build/test runs, conversational answers. Keeping small diffs in the main session is correct — don't spawn an `implementer` to prove you delegated.
 
-- Reading a specific file you already know the path to
-- Small or medium edits spanning 1–2 files, or any change you can finish in under ~10 tool calls
-- Running build or test commands
-- Simple searches (one grep for a string, one glob for a file pattern)
-- Answering a conversational question with no code change
+**Rules that close the loop:**
 
-### Enforcement Rules
+1. **Re-ask when scope grows.** Predicted small but it's now 3+ files or 10+ calls? Stop and delegate the remainder — sunk tool calls are not a reason to finish solo.
+2. **Overriding a YES requires a stated reason.** One line, in your response, before proceeding (e.g. "12 calls but all sequential edits to one file"). The `tool-cadence` hook escalates on streaks that continue past a reminder with no Agent call.
+3. **Delegating needs no narration** — just call the Agent tool.
+4. **Parallelize**: independent delegations go in a single message — they run concurrently.
 
-1. **Parallelize**: when multiple delegations have no dependencies, send all `Agent` calls in a single message — they run concurrently.
-2. **Don't narrate the decision**: if a trigger fires, call the Agent tool directly. Don't explain why you're delegating — just delegate.
-3. **Match the tool to the size**: the MUST/SHOULD triggers fire only for genuinely heavy work (3+ files, 10+ tool calls, security-sensitive code, new test files). Small and medium edits staying in the main session is the correct default — it keeps the diff reviewable in the working tree before commit. Don't spawn an `implementer` just to prove you delegated.
+> **Briefing contract for `implementer`**: as a subagent it gets only your prompt — no conversation context, none of the files you've read — so every prompt MUST contain actual content, not references: the user's ask verbatim, exact file paths and line ranges, the change to make (paste the planner output; never write "based on findings" or "according to plan"), the verification command, and a scope boundary. Thin prompts cause regressions; the agent will refuse them. It runs in the live working tree and leaves changes **uncommitted** for you to review before they land. Full contract: `agents/implementer.md` REQUIRED BRIEFING. This applies equally to `explore` → `implementer` and `planner` → `implementer` chains.
 
 For full orchestration mode, activate `profiles/maestro.md`. Model routing per agent: see `docs/agent-models.md`.
 

diff --git a/config/40-hooks.json b/config/40-hooks.json
@@ -108,7 +108,8 @@
 					{
 						"type": "command",
 						"command": "bun \"$HOME/.claude/src/hooks/tool-cadence.ts\"",
-						"timeout": 3
+						"timeout": 3,
+						"continueOnBlock": true
 					}
 				]
 			},

diff --git a/docs/hooks-reference.md b/docs/hooks-reference.md
@@ -315,7 +315,7 @@ Matchers filter which specific tool invocations or events trigger a hook.
 | Script | Purpose | Async |
 |--------|---------|-------|
 | `session-title.ts` | Derives session title from first prompt; emits `hookSpecificOutput.sessionTitle` so `claude --resume <name>` works | Yes |
-| `delegation-detector.ts` | Regex-scores incoming prompt for breadth signals (phrases like "do all", "across the repo", path-shaped tokens, large numbered lists). At score ≥ 2, injects a system reminder via `additionalContext` pointing at maestro / multi-agent delegation | No |
+| `delegation-detector.ts` | Regex-scores incoming prompt for breadth signals (phrases like "do all", "across the repo", path-shaped tokens, large numbered lists). At score ≥ 2, injects a compact `additionalContext` reminder: score, matched signals, and a one-line routing guide (maestro / implementer / parallel agents in ONE message). Overriding requires a stated reason. | No |
 
 > Note: Since v2.1.108 Claude Code has a native `Skill` tool that auto-matches skills; the old `skill-activation` hook was removed. Correction detection was removed as low-signal.
 
@@ -356,7 +356,7 @@ Logs are used by `bun run claude-audit` to analyze command patterns, security co
 
 | Script | Purpose | Async |
 |--------|---------|-------|
-| `tool-cadence.ts` (parallelmax branch) | Counts consecutive non-Agent tool calls in main context. At N=8, emits a system reminder pointing at the delegation rules. State at `~/.claude/tmp/parallelmax-counter.json`; resets when an Agent tool fires. 60s debounce | No |
+| `tool-cadence.ts` (parallelmax branch) | Counts consecutive non-Agent tool calls and distinct file edits per streak. **One nudge per streak**: fires at threshold 12 calls OR 3+ files edited — emits a compact `additionalContext` reminder with the delegation heuristic. **One escalation per streak**: if the streak continues past the nudge by another threshold-worth of calls or 2+ more files, emits a soft block (`continueOnBlock: true`) via `blockDecision` — the turn continues but the signal is hard to ignore. Both signals suppress when the review queue is at capacity. Resets on any Agent call. 60s debounce. State at `~/.claude/tmp/parallelmax-counter.json`. `CC_PARALLELMAX_THRESHOLD` env override (default 12). | No |
 
 ### PostToolUseFailure
 

diff --git a/src/hooks/delegation-detector.ts b/src/hooks/delegation-detector.ts
@@ -56,13 +56,11 @@ async function main(): Promise<void> {
 
   if (score < 2) return;
 
-  const reasonList = reasons.map((r) => `  • ${r}`).join("\n");
   const msg =
-    `Breadth signals detected in this prompt (score ${score}):\n${reasonList}\n\n` +
-    `This prompt likely spans multiple files or requires parallel workstreams. ` +
-    `Per CLAUDE.md delegation rules: use Agent(maestro) for full-feature orchestration, ` +
-    `Agent(implementer) for multi-file implementation, or spawn multiple parallel agents ` +
-    `in a SINGLE message. Do not self-execute tasks that trigger the 3+ file or 10+ tool-call thresholds.`;
+    `Breadth signals in this prompt (score ${score}): ${reasons.join("; ")}. ` +
+    `Likely 3+ files / parallel workstreams — per CLAUDE.md: route to Agent(maestro) for orchestration, ` +
+    `Agent(implementer) for multi-file changes, or parallel agents in ONE message. ` +
+    `Overriding requires a one-line stated reason.`;
 
   console.log(
     JSON.stringify({

diff --git a/src/hooks/tool-cadence.ts b/src/hooks/tool-cadence.ts
@@ -3,7 +3,9 @@
 // unmatched on EVERY PostToolUse (two Bun spawns per tool call → one):
 //
 //   • parallelmax branch (was parallelmax-nudge.ts) — counts consecutive
-//     non-Agent tool calls; at 8, nudges the model toward delegation.
+//     non-Agent tool calls and tracked files; fires one soft nudge per streak
+//     at THRESHOLD calls or FILES_THRESHOLD distinct file edits, then one
+//     escalation (soft block via continueOnBlock) if the streak continues.
 //   • review-queue branch (was review-queue-nudge.ts) — backpressure, the
 //     consumer-side counterpart: counts Agent spawns since the last commit,
 //     nudges at CC_MAX_UNREVIEWED, drains on successful commit/push,
@@ -15,7 +17,13 @@
 // runHook — any error → silent success, never break a tool call.
 
 import { runGit } from "../lib/git.ts";
-import { readHookInput, readState, runHook, writeState } from "../lib/hook-runtime.ts";
+import {
+  blockDecision,
+  readHookInput,
+  readState,
+  runHook,
+  writeState,
+} from "../lib/hook-runtime.ts";
 import {
   type BashResult,
   buildNudge,
@@ -41,19 +49,39 @@ import {
 // routine multi-step edits don't trip it, low enough to catch genuine
 // "should have fanned out" runs.
 const THRESHOLD = Number(process.env.CC_PARALLELMAX_THRESHOLD) || 12;
+// Distinct file edits before the nudge fires (regardless of call count).
+const FILES_THRESHOLD = 3;
 const DEBOUNCE_MS = 60_000;
 const COUNTER_STATE = "parallelmax-counter.json";
 const QUEUE_STATE = "review-queue.json";
 
+// File-edit tools and the field carrying the path in tool_input.
+const FILE_EDIT_TOOLS: Record<string, string> = {
+  Write: "file_path",
+  Edit: "file_path",
+  MultiEdit: "file_path",
+  NotebookEdit: "notebook_path",
+};
+
 interface CounterState {
   count: number;
   lastTool: string;
   firedAt?: number;
+  files?: string[];
+  nudged?: boolean;
+  countAtNudge?: number;
+  filesAtNudge?: number;
+  escalated?: boolean;
 }
 
 type Payload = {
   tool_name: string;
-  tool_input: { command?: string; subagent_type?: string };
+  tool_input: {
+    command?: string;
+    subagent_type?: string;
+    file_path?: string;
+    notebook_path?: string;
+  };
   tool_response: BashResult;
   cwd?: string;
 };
@@ -74,41 +102,90 @@ async function currentHead(cwd: string): Promise<string | undefined> {
 
 // --- Branch 1: consecutive non-Agent call counter --------------------------
 
-async function parallelmaxBranch(toolName: string): Promise<void> {
-  const state = await readState<CounterState>(COUNTER_STATE, { count: 0, lastTool: "" });
+async function parallelmaxBranch(
+  toolName: string,
+  toolInput: Payload["tool_input"],
+): Promise<void> {
+  // Defensive defaults for old-shape state files.
+  const raw = await readState<CounterState>(COUNTER_STATE, { count: 0, lastTool: "" });
+  const state = {
+    count: raw.count ?? 0,
+    lastTool: raw.lastTool ?? "",
+    firedAt: raw.firedAt as number | undefined,
+    files: raw.files ?? ([] as string[]),
+    nudged: raw.nudged ?? false,
+    countAtNudge: raw.countAtNudge ?? 0,
+    filesAtNudge: raw.filesAtNudge ?? 0,
+    escalated: raw.escalated ?? false,
+  };
 
   if (toolName === "Agent") {
+    // Reset the whole streak on delegation; preserve firedAt for debounce.
     await writeState(COUNTER_STATE, {
       count: 0,
       lastTool: toolName,
       firedAt: state.firedAt,
+      files: [],
+      nudged: false,
+      countAtNudge: 0,
+      filesAtNudge: 0,
+      escalated: false,
     });
     return;
   }
 
   state.count += 1;
   state.lastTool = toolName;
 
-  if (state.count >= THRESHOLD) {
-    const now = Date.now();
-    // Debounce: skip if we fired recently to avoid spamming.
-    if (!state.firedAt || now - state.firedAt >= DEBOUNCE_MS) {
-      // Suppress the "delegate more" nudge when the review queue is already at/
-      // over capacity: pushing more production when review is the bottleneck is
-      // the orchestration-tax failure mode (the review-queue branch covers the
-      // other direction). Still debounce + reset so we don't re-check on every
-      // call.
-      const rq = await readState<{ awaiting: number }>(QUEUE_STATE, { awaiting: 0 });
-      if (rq.awaiting < maxUnreviewed()) {
-        const msg =
-          `You have made ${state.count} consecutive tool calls without delegating to an Agent. ` +
-          `Opus 4.8 defaults to self-execution, but CLAUDE.md requires delegation when tasks span ` +
-          `3+ files or 10+ tool calls. Consider Agent(implementer), Agent(explore), or Agent(maestro) ` +
-          `to parallelize work, reduce context pressure, and follow the project guardrails.`;
-        emit(msg);
-      }
+  // Track file edits (deduplicated, capped at 20).
+  const filePathField = FILE_EDIT_TOOLS[toolName];
+  if (filePathField) {
+    const fp = toolInput[filePathField as keyof typeof toolInput];
+    if (typeof fp === "string" && fp && !state.files.includes(fp)) {
+      state.files = [...state.files, fp].slice(0, 20);
+    }
+  }
+
+  const now = Date.now();
+  const debounceOk = !state.firedAt || now - state.firedAt >= DEBOUNCE_MS;
+
+  // --- Escalation (at most once per streak) ---------------------------------
+  // Fires AFTER nudge, when the streak continues past the reminder.
+  if (
+    state.nudged &&
+    !state.escalated &&
+    (state.count - state.countAtNudge >= THRESHOLD ||
+      state.files.length - state.filesAtNudge >= 2) &&
+    debounceOk
+  ) {
+    const rq = await readState<{ awaiting: number }>(QUEUE_STATE, { awaiting: 0 });
+    if (rq.awaiting < maxUnreviewed()) {
+      state.escalated = true;
+      state.firedAt = now;
+      // Write state BEFORE calling blockDecision (never returns).
+      await writeState(COUNTER_STATE, state);
+      blockDecision(
+        `Delegation violation: ${state.count} tool calls and ${state.files.length} file(s) edited in this streak — past a prior reminder, still no Agent call. This matches a CLAUDE.md MUST-delegate trigger. Delegate the remainder (Agent(implementer) for changes, Agent(explore) for reads) or state a one-line justification before continuing.`,
+      );
+    }
+  }
+
+  // --- Soft nudge (at most once per streak) ---------------------------------
+  if (
+    !state.nudged &&
+    (state.count >= THRESHOLD || state.files.length >= FILES_THRESHOLD) &&
+    debounceOk
+  ) {
+    const rq = await readState<{ awaiting: number }>(QUEUE_STATE, { awaiting: 0 });
+    if (rq.awaiting < maxUnreviewed()) {
+      const filesClause = state.files.length > 0 ? ` and ${state.files.length} file(s) edited` : "";
+      emit(
+        `Delegation check — ${state.count} tool calls${filesClause} in this streak with no Agent call. Heuristic: 3+ files or 10+ calls → delegate (explore = read/map, implementer = multi-file change, parallel agents for independent work). Delegate the remainder now, or state a one-line reason for staying solo and continue.`,
+      );
+      state.nudged = true;
+      state.countAtNudge = state.count;
+      state.filesAtNudge = state.files.length;
       state.firedAt = now;
-      state.count = 0;
     }
   }
 
@@ -180,17 +257,16 @@ async function main(): Promise<void> {
   const toolName = payload.tool_name ?? "";
 
   // Branch isolation: a crash in one branch must not silence the other —
-  // before the merge these were separate processes. Order matches the old
-  // config order (parallelmax first), so on the rare event where both nudge
-  // (e.g. an 8th consecutive call that is also a draining commit) the two
-  // JSON lines print in the same order as before.
+  // before the merge these were separate processes. reviewQueueBranch runs
+  // first because parallelmaxBranch may end the process via blockDecision()
+  // on escalation; the review-queue state must already be written by then.
   try {
-    await parallelmaxBranch(toolName);
+    await reviewQueueBranch(payload, toolName);
   } catch {
     // fail open
   }
   try {
-    await reviewQueueBranch(payload, toolName);
+    await parallelmaxBranch(toolName, payload.tool_input ?? {});
   } catch {
     // fail open
   }