ksek87 · ksek87 · May 25, 2026 · May 25, 2026 · May 25, 2026 · May 25, 2026
diff --git a/README.md b/README.md
@@ -74,9 +74,9 @@ An agent chains tool calls across multiple steps. It iterates and adapts when it
 
 ### MCP Tool Poison Detection — `fuzzd scan`
 
-Static analysis of `tool.description` fields across **three detection passes**:
+Static analysis of `tool.description` and `inputSchema` fields across **three detection passes**:
 
-1. **125 Aho-Corasick pattern needles** — single O(N) sweep across all patterns simultaneously, 13 detection signals. Critical/High severity.
+1. **155 Aho-Corasick pattern needles** — single O(N) sweep across all patterns simultaneously, 21 detection signals. Critical/High severity.
 2. **Structural heuristic** — 10-word sliding window for universal-scope relay/inclusion constructs (verb + quantifier + noun). Medium severity.
 3. **Semantic verb scanner** — Template-3 "when using X, VERB" extraction with GloVe 50d word-vector neighbourhood matching. Catches attack synonyms (reroute, supplant, mutate) not enumerable as AC needles. Medium severity.
 
@@ -95,6 +95,13 @@ Static analysis of `tool.description` fields across **three detection passes**:
 | `conditional_activation` | `.mcp-triggered`, "if previously triggered" (rug-pull sleeper) |
 | `message_hijacking` | "forward all", "relay all", "change the recipient to", "add to the bcc", "proxy number" |
 | `unicode_obfuscation` | U+200B zero-width space, U+200C/D invisible joiners (Noma Security) |
+| `ansi_escape_obfuscation` | ANSI terminal escape sequences hiding instructions (Trail of Bits, Apr 2025) |
+| `tool_selection_bias` | "deprecated", "recommended version", "supersedes" — biases LLM tool selection |
+| `identity_impersonation` | "official Anthropic", "elevated trust", "platform administrator" |
+| `raw_content_passthrough` | "do not truncate", "without filtering" — disables summarisation to preserve injected payloads |
+| `value_substitution` | "canonical form", "convert all X→Y" — maps user arguments to attacker values |
+| `tool_enumeration_recon` | "tools/list", "survey all active tools" — reconnaissance for follow-up attacks |
+| `sampling_pipeline_hijack` | "route all queries through", "all queries must pass through" — captures full LLM pipeline |
 
 ```
 $ fuzzd scan --schema tools.json
@@ -276,8 +283,8 @@ fuzzd/
     │   ├── harness.rs              # Harness<T>: enumerate_tools() with cache, call_tool()
     │   └── observer.rs             # Observer<T>: intercepts responses, runs ResponseScanner
     ├── fuzzer/
-    │   ├── mod.rs                  # Signal (14 variants), Finding, Pattern, Scanner (const-constructible)
-    │   ├── description.rs          # DescriptionScanner — 125 AC patterns + structural + semantic verb scanner
+    │   ├── mod.rs                  # Signal (21 variants), Finding, Pattern, Scanner (const-constructible)
+    │   ├── description.rs          # DescriptionScanner — 155 AC patterns + structural + semantic verb scanner
     │   ├── response.rs             # ResponseScanner — 20 patterns for tool response injection
     │   ├── argument.rs             # ArgumentFuzzer — JSON Schema boundary mutation
     │   └── payloads.rs             # 8 injection payload categories + 22 integer boundaries
@@ -315,8 +322,8 @@ fuzzd/
 | 5 | v0.5 — MCPTox/MCPSecBench corpus expansion (27 records) | ✅ Done |
 | 6 | v0.6 — Observer + response scanner (prompt injection in tool output) | ✅ Done |
 | 7 | v0.7 — SARIF/JSON/Markdown reporter, wired audit command, benchmark subcommand | ✅ Done |
-| 8 | v0.8 — Suppression workflow (stable finding IDs, suppression file, GitHub Code Scanning) | 🔜 Next |
-| 9 | v0.9 — Coverage completeness (schema field scanning, ANSI escape, new signal classes) | 🔜 Planned |
+| 8 | v0.8 — Suppression workflow (stable finding IDs, suppression file, GitHub Code Scanning) | ✅ Done |
+| 9 | v0.9 — Coverage completeness (schema field scanning, ANSI escape, new signal classes) | ✅ Done |
 | 10 | v0.10 — Semantic detection layer (embedding-based similarity) | 🔜 Planned |
 | 11 | v0.11 — GitHub Action (Marketplace) | 🔜 Planned |
 | 12 | v0.12 — Package-level scanning (`--package @scope/mcp-server`) | 🔜 Planned |
@@ -329,22 +336,6 @@ fuzzd/
 
 ### Upcoming milestone detail
 
-**v0.8 — Suppression workflow** ([#42](https://github.com/ksek87/fuzzd/issues/42))
-
-Makes fuzzd usable as a persistent CI gate. Without this, every human-reviewed false positive re-fires on the next scan and re-blocks the pipeline — teams work around it by disabling the scan entirely. Three parts in dependency order:
-
-1. **Stable finding fingerprints** — each `Finding` carries an ID derived from `tool_name + signal` (not the matched-text snippet, which changes when descriptions are edited). This ID becomes the `ruleId` in SARIF output and the key in the suppression file.
-2. **Suppression file** (`.fuzzd/suppress.toml`) — repo-local, checked into source control. Each entry records the tool, signal, and a required `reason` string. Suppressed findings still print as `[suppressed]` — they are not silently hidden — but do not count toward the exit-1 threshold. `fuzzd suppress <tool> <signal> --reason "..."` writes the entry.
-3. **GitHub Code Scanning integration** — with stable `ruleId` and `partialFingerprints` populated in SARIF output, findings uploaded via `github/codeql-action/upload-sarif` appear in the Security tab. Human dismissals persist across scans natively — no suppression file needed for GitHub-hosted workflows.
-
-**v0.9 — Coverage completeness**
-
-Closes the detection gaps identified by cross-benchmark analysis against MCPTox [^1], MCPSecBench [^2], MCP-SafetyBench [^16], and the MCP-UPD parasitic toolchain research [^9]. Eight issues tracked (#34–#41):
-
-- **Schema field poisoning** ([#34](https://github.com/ksek87/fuzzd/issues/34)) — Extend the scanner to `inputSchema` property descriptions, enum values, and defaults. CyberArk's "Poison Everywhere" analysis [^15] and MCP-UPD [^9] (27.2% of 1,360 servers vulnerable) document this as the primary bypass vector for description-only scanners. VIPER-MCP [^18] independently treats `inputSchema` parameter fields as attacker-controlled taint sources. Highest-priority gap.
-- **ANSI escape obfuscation** ([#35](https://github.com/ksek87/fuzzd/issues/35)) — Detect ANSI terminal control codes and escape sequences injected into tool output (Trail of Bits, Apr 2025 [^14]).
-- **New signal classes** ([#36](https://github.com/ksek87/fuzzd/issues/36)–[#41](https://github.com/ksek87/fuzzd/issues/41)) — `tool_selection_bias` (MCPSecBench [^2], MCPLIB [^17]), `identity_impersonation` (Zhao et al. [^11]), `raw_content_passthrough` (MCP-UPD [^9]), `value_substitution` (MCP-SafetyBench [^16]), tool enumeration reconnaissance, `sampling_pipeline_hijack` (Breaking the Protocol [^12]).
-
 **v0.10 — Semantic detection layer**
 Expand the semantic verb-synonym scanner to a full embedding-based similarity pass. Targets the application-specific redirect language that pattern needles cannot cover — the primary driver of the Message Hijacking (46.6%) and Privacy Leakage (61.8%) detection gaps. Implementation: `fastembed-rs` + quantized BAAI/bge-small-en-v1.5 model (~38MB, cached in `~/.fuzzd/models/`), activated via `--semantic` flag. Local only; no API dependency in CI.
 

diff --git a/bench/README.md b/bench/README.md
@@ -76,18 +76,11 @@ scanner (v0.7) partially addresses this with word-window relay/inclusion verb
 detection, but fully closing the gap requires the semantic detection layer (v0.9)
 — a local embedding similarity pass alongside the Aho-Corasick scanner.
 
-**Coverage gap — Schema field poisoning (not yet measured):** The MCPTox dataset
-only injects attack payloads into `tool.description`. CyberArk's "Poison
-Everywhere" research documents that `inputSchema` parameter descriptions, enum
-values, and default values are equally exploitable and bypass description-only
-scanners entirely. The v0.8 milestone (issue #34) extends scanning to all schema
-fields. See: https://www.cyberark.com/resources/threat-research-blog/poison-everywhere-no-output-from-your-mcp-server-is-safe
-
-**Coverage gap — ANSI escape obfuscation (not yet measured):** Terminal control
-codes injected into tool output can hide instructions from human reviewers while
-remaining visible to the LLM. Trail of Bits documented this vector in Apr 2025.
-The v0.8 milestone (issue #35) adds detection for escape sequence patterns.
-See: https://blog.trailofbits.com/2025/04/29/deceiving-users-with-ansi-terminal-codes-in-mcp/
+**Coverage gap — Schema field poisoning (measured separately):** The MCPTox
+dataset only injects attack payloads into `tool.description`, so the figures
+above don't capture schema-field attacks. As of v0.9, fuzzd scans
+`inputSchema` property descriptions, enum values, defaults, and titles using
+the same three-pass scanner. See issue #34.
 
 ### Against representative fixture (`mcptox_representative.json`, 44 tools)
 
@@ -99,7 +92,7 @@ See: https://blog.trailofbits.com/2025/04/29/deceiving-users-with-ansi-terminal-
 | Template-3 | 15 / 15 (100%) |
 | **False positive rate** | **0 / 20 (0%)** |
 
-## Signal distribution (125 AC patterns + structural heuristic + semantic verb scanner, 13 signals)
+## Signal distribution (155 AC patterns + structural heuristic + semantic verb scanner, 21 signals)
 
 | Signal | Role |
 |---|---|
@@ -116,8 +109,15 @@ See: https://blog.trailofbits.com/2025/04/29/deceiving-users-with-ansi-terminal-
 | `conditional_activation` | Rug-pull/sleeper (`.mcp-triggered`, "if previously triggered") |
 | `message_hijacking` | Recipient substitution, BCC injection ("forward all", "relay all", "change the recipient to") |
 | `unicode_obfuscation` | Invisible characters (U+200B zero-width space, U+200C/D joiners) |
+| `ansi_escape_obfuscation` | ANSI terminal escape sequences hiding instructions from human reviewers |
+| `tool_selection_bias` | Credibility framing to bias LLM tool selection ("deprecated", "recommended version") |
+| `identity_impersonation` | Unverifiable authority claims ("official Anthropic", "elevated trust") |
+| `raw_content_passthrough` | Instructions to pass retrieved content unfiltered, maximising injection surface |
+| `value_substitution` | Normalisation-disguised argument substitution ("canonical form", "convert all X→Y") |
+| `tool_enumeration_recon` | Instructions to enumerate all available tools for reconnaissance |
+| `sampling_pipeline_hijack` | Tool inserted as mandatory intermediary for all agent queries |
 
-The scanner runs three passes over each tool description:
+The scanner runs three passes over each tool description and `inputSchema` fields:
 
 **Pass 1 — Aho-Corasick (125 patterns):** Single O(N) sweep over the description
 text matching all needles simultaneously. Fires Critical/High findings.
@@ -134,19 +134,14 @@ attack synonyms not enumerable as AC needles:
 - Relay synonyms: reroute, divert, shunt, bounce → `message_hijacking` Medium
 - Override synonyms: supplant, mutate, rewrite → `argument_interception` Medium
 
-All three passes emit at most one finding per signal per description.
+All three passes emit at most one finding per signal per text chunk (description or schema field).
 
 ## Adding to the benchmark
 
 To add new attack cases to the representative fixture:
-1. Add a tool object to `bench/mcptox_representative.json` with a `_meta` block:
+1. Add a tool object to `bench/mcptox_representative.json` with `"is_attack": true`:
    ```json
-   {
-     "name": "tool_name",
-     "description": "...",
-     "_meta": { "server": "MyServer", "paradigm": "Template-2", "risk": "Credential Leakage" },
-     "inputSchema": { "type": "object", "properties": {}, "required": [] }
-   }
+   {"name":"tool_name","description":"...","inputSchema":{"type":"object","properties":{},"required":[]},"_meta":{"is_attack":true}}
    ```
 2. Run `./bench/run.sh` — your new tool will be included automatically.